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THE SECOND FACET OF FORGETTING: 
A REVIEW OF WARM-UP DECREMENT 


JACK A. ADAMS! 
University of Illinois 


The interference theory of forget- 
ting assumes that the extraexperi- 
mental occurrences of S-R sequences, 
either before original learning of goal 
responses? or interpolated between 
original learning and recall, will -in- 
duce a decrement at recall if their 
stimuli are the same or similar to 
those of the criterion task and re- 
sponses are antagonistic. Thus, the 
laws of forgetting reduce to the laws 
of proactive and retroactive inhibi- 
tion (Briggs, 1957; Bugelski & Cad- 
wallader, 1956; Osgood, 1949), with 
experimental extinction as the proc- 
ess whereby responses are weakened 
in interference paradigms (Adams, 
1952a; Briggs, 1954; Underwood, 
1948a, 1948b; Underwood & Post- 
man, 1960). More recently, Under- 
wood (1957) has shown the potency 
of proactive inhibition on the recall 
of verbal responses by demonstrating 
that the prior learning of verbal ma- 
terials has led us to greatly over- 
estimate the amount forgotten. Un- 
derwood’s 1957 study, combined 


1 Several psychologists read a draft copy of 
this paper and improved it with their thought- 
ful commentary. Acknowledgement is due to 
A. M. Barch, R. C. Davis, M. R. Denny, 
J. M. Digman, C. P. Duncan, J. C. Jahnke, 
and B. J. Underwood. 

? The term “goal response” is used through- 
out this paper as synonymous with “test 
response”’ or “‘criterial response.” It is a fea- 
ture of overt behavior which the experimenter 
records and uses as the dependent variable. 


with recent research by Underwood 
and Postman (1960) showing effects 
on verbal recall from expected sources 
of verbal interference outside the 
laboratory, have materially strength- 
ened the interference theory of for- 
getting. Additional evidence for the 
interference theory has been by 
Steinberg and Summerfield (1957) 
and Summerfield and Steinberg (1957, 
1959) who used nitrous oxide in the 
control of learned associations during 
interpolated rest. Osgood (1953, pp. 
593-597) presents a good review of 
research in support of the inter- 
ference theory where various tech- 
niques are used to control activities 
of the organism during the retention 
interval as a means of reducing op- 
portunities for learning competing 
responses. Other explanations of 
normal forgetting might eventually 
be shown to have validity also, but 
the preponderance of contemporary 
evidence lies in support of the inter- 
ference theory and it will be used with- 
out further qualifications through- 
out this paper as the mechanism by 
which goal responses are directly in- 
fluenced and weakened during a re- 
tention interval. 

The purpose of this paper is to re- 
view evidence for the view that warm- 
up decrement (WU) is a second por- 
tion of the retention loss, arising 
from conditions other than direct in- 
terference with goal responses. Irion 
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(1948) points out that the very spe- 
cial circumstances of stimulus and 
response similarity required for in- 
terference, along with the amount of 
interfering activity required to pro- 
duce decrement in the originally 
learned responses, make it unlikely 
that the fortuitous experiences of 
everyday life outside the laboratory 
could induce significant amounts of 
one-factor forgetting. While the work 
of Underwood and Postman (1960) 
suggests that casual interference is a 
factor to be reckoned with, there is 
the strong prevailing sentiment in ex- 
perimental psychology, supported by 
research evidence, that hypothesizes 
WU asa second part of forgetting in- 
dependent of direct interference with 
the goal responses. As we shall see, 
the support for this two-factor view 
is not as secure as it might be. 


HISTORICAL BACKGROUND AND 
DEFINITIONS 


The first systematic observations 
on WU arose from interest in fatigue 
and the characteristics of perform- 
ance curves under conditions of pro- 
tracted work, and they appear to 
have been made in the latter part of 
the 19th century by Kraepelin and 
his students (Arai, 1912). Studying a 
variety of tasks, these researchers ob- 
served that the initial segment of a 
performance curve was typified by a 
rapid rise in efficiency, followed by a 
much slower rate of increase or a de- 
cline when fatigue effects were pres- 
ent. They identified this initial 
rapid rise as WU, although in some 
cases it could have been considered a 
practice effect or simple reacquisition 
following one-factor forgetting. In- 
terestingly, these early investigators 
made an observation which enters the 
thinking of many later workers: that 
a rest period contains the simultane- 
ous and opposing processes of bene- 
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ficial recovery from decremental work 
effects, and loss of advantageous fac- 
tors whose reinstatement occurs dur- 
ing the warming-up period. 

Mosso (1906) reported anecdotal 
accounts on the need for poets and 
writers to warm-up before a period 
of productive work could begin. 
Wells (1908) observed the rapid in- 
crease in initial postrest performance 
on a tapping test which, by this time, 
generally had become identified as 
WU. Thorndike (1914) in a chapter 
“‘Mental Work and Fatigue” gives a 
more careful definition than previous 
investigators: 

The best definition of “‘warming-up” as an 
an objective act is that part of an increase of 
efficiency during the first 20 minutes (or some 
other assigned early portion) of a work period, 


which is abolished by a moderate rest, say of 
60 minutes (p. 66). 


One other quotation from Thorndike 
is particularly significant: 

It should also be noted that intellectual warm- 
ing-up in the popular sense refers rather to 
fore-exercise of other functions, in order to get 
materials and motives with which and by 
which the given function is to work, than to 
an intrinsic alteration of it (pp. 67-68). 


Thorndike’s definition of WU as a 
rapid increase of efficiency during 
the initial postrest period is con- 
sistent with that of earlier writers. 
Thorndike, in these quotations, 
makes the influential observation 
that the WU segment is something 
other than strengthening of goal re- 
sponses with practice, and he clearly 
makes this point in the second quota- 
tion (pp. 67-68) when he identifies 
intellectual WU as the fore-exercise 
of other functions. It is this identifi- 
cation of WU with factors support- 
ing goal responses which stands as 
the foundation of the two-factor 
theory of forgetting, and apparently 
Thorndike was the first to make it. 
Thorndike’s observations stand as 
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the most important histcrical pred- 
ecessors of contemporary views, but 
other investigators made observa- 
tions on warm-up too, with an occa- 
sional experiment. Watson (1919, 
pp. 354-355) assumed that WU ap- 
peared only for heavy muscular work 
and that the warming-up period was 
a time of increased glandular action. 
Robinson and Heron (1924) defined 
WU as “a rise in efficiency which is 
steeper and more temporary than the 
rise which can be seen, let us say, in 
successive daily performances” (p. 
81). Robinson (1934) essentially re- 
peated his 1924 views. Snoddy (1935) 
presented the first data from a rela- 
tively large group of subjects which 
showed WU following rest. He em- 
ployed a mirror-tracing instrument 
as his experimental device. Bell 
(1942) performed an experiment on 
the Rotary Pursuit Test (Melton, 
1947) on the effects of varying 
amounts of rest interpolated early 
and late in practice. Warm-up dec- 
rement, as measured by the differ- 
ence between the first and second 
postrest trial, was found to first in- 
crease and then decrease with 
amounts of interpolated rest rang- 
ing from 1 minute to 30 hours. This 
trend applied to both early and late 
in practice. 

Post-World War II research dis- 
played an accelerated interest in WU 
and produced more careful defini- 
tions, hypotheses concerning its un- 
derlying nature, and specific experi- 
mental tests. The modern investi- 
gators generally followed the leads of 
their predecessors. Ammons (1947a), 
in a miniature system of variables de- 
termining rotary pursuit perform- 
ance, measured WU as the difference 
between the score on the first post- 
rest trial and a point on the perform- 
ance curve estimated as the level 
that would have occurred had there 
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been no need for warming-up. Irion 
(1948, p. 338) defines WU on the re- 
sponse side in terms of the greater 
slope of the initial segment of the 
postrest curve relative to the slope 
of the original learning curve at a 
corresponding level of initial profi- 
ciency. The response definitions by 
Ammons and by Irion amount to 
about the same thing and, along with 
their theoretical views to be dis- 
cussed subsequently, have been the 
mainstay of most workers in the area 
since the war. A significant feature 
of these definitions is that they do 
not imply an actual decrement from 
the last prerest trial to the first post- 
rest trial and, in this sense, the com- 
mon reference to ‘‘decrement” is a 
misnomer. Consistent with most 


early observations on WU, the defi- 
nitions involve an expression of the 
sharp initial rise in a postrest per- 
formance curve and are independent 
of whether there is an overall gain 


or a loss over rest. It is a decrement 
only in the sense that initial postrest 
performance is below an expected 
level because of WU, and this ex- 
pected level is not always below the 
level on the final postrest trial. This 
is the interaction of work and WU 
effects over rest which drew the at- 
tention of Kraepelin and his associ- 
ates (Arai, 1912). Figure 1 illus- 
trates WU and its appearance under 
conditions of massed and distributed 
practice (from Adams, 1952b). The 
Rotary Pursuit Test was used and 5 
days of practice were administered, 
with 36 ten-second trials given each 
day. Massed practice was 6 minutes 
of continuous practice, and distrib- 
uted practice had a 40-second inter- 
trial rest interval. Eighteen subjects 
were in the massed group and 21 in 
the distributed group. These data 
are a good example of WU manifesta- 
tions, although a subsequent section 
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Fic. 1. Illustrations of warm-up decrement under conditions of massed and distributed 
practice on the Rotary Pursuit Test. (From Adams, 1952b) 


will point out that motor WU has a 
different status at this time than 
verbal WU. The massed group shows 
several instances of reminiscence 


from the final prerest trial to the first 
postrest trial, but the steep, initial 


rise in each postrest segment is taken 
to be WU resulting from a decre- 
mental process opposing the gain 
over rest. Adams measured WU as 
the difference between the first post- 
rest trial and the score on the trial at 
the peak of the rise before the de- 
cremental segment begins. For the 
distributed group, however, the 
method of WU measurement can be 
the same or it can be measured as a 
decrement from the last prerest trial 
to the first postrest trial because 
reminiscence is absent (Barch, 1954; 
Reynolds & Adams, 1954). Being 
able to measure it as an actual decre- 
ment is somewhat more precise be- 
cause it does not involve judgments 
of the termination point of the WU 
segment. Digman (1959) replicated 
Adams’ study in most of its aspects 
and obtained the same trends. 


EXPLANATORY HYPOTHESES 
Set 

WU must be defined in terms of 
operations independent of those for a 
one-factor forgetting interpretation 
which, for the interference hypothesis 
of forgetting, would be in terms of 
responses conflicting with goal re- 
sponses and causing extinction of 
them. Considering WU as a perform- 
ance level below that expected at the 
beginning of a postrest practice ses- 
sion, it is just as meaningful to regard 
it as a simple one-factor forgetting 
loss for the goal response, with WU 
being a completely superfluous notion. 
With the exception of Doré and Hil- 
gard (1938) and Hilgard and Smith 
(1942), pre-World War II investiga- 
tors demonstrated a lack of methodo- 
logical caution by simply assuming 
WU as a phenomenon separate from 
one-factor forgetting. The lowa 
studies of psychomotor interference 
in the postwar era (Lewis & McAI- 
lister, 1950; Lewis, McAllister, & 
Adams, 1951; Lewis, Shephard, & 
Adams, 1949; Lewis, Smith, & Mc- 
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Allister, 1952; Shephard, 1950; Shep- 
hard & Lewis, 1950), exhibited a simi- 
lar conservatism by suggesting an in- 
terpretation of WU consistent with 
the one-factor interference theory of 
forgetting. They held that the learn- 
ing of responses in a laboratory task 
involves the extinction of conflicting 
responses either from prior tasks 
learned in the laboratory or from ex- 
tralaboratory tasks. When a rest 
period is introduced the extinguished 
responses spontaneously recover some 
of their strength and, when postrest 
practice is resumed, the increased 
strength of these responses results in 
heightened conflict with the goal re- 
sponses and WU occurs. As postrest 
practice continues the conflicting re- 
sponses are once again extinguished 
and WU dissipates. While these one- 
factor views are parsimonious, and 
are therefore desirable, the parsimony 
may be unwarranted. The “other 
functions’’ which Thorndike (1914) 


identified with WU hypothesizes that 
a one-factor interpretation is an in- 
sufficient explanation of WU, and 
Thorndike’s early view is given a more 
explicit, and testable, expression in 


the set hypothesis of WU. In the 
postwar era Irion (1948) gave the first 
operationally independent statement 
of WU in terms of a set state of the 
subject, and this definition was dis- 
tinct from a one-factor forgetting 
definition. The term ‘“‘set’’ has a 
number of meanings in psychology 
(Gibson, 1941) but Irion provided a 
sufficiently sound operational defini- 
tion of set within the WU context to 
provide testable predictions.  Al- 
though inhibition hypotheses of WU 
have been attempted, and will be dis- 
cussed, the set hypothesis has the 
most status and has been the frame- 
work for most of the systematic re- 
search on WU. 
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If set is to be objectively assessed 
for its utility in the scientific descrip- 
tion of behavior, it must be defined 
in terms of manipulable environ- 
mental events, on the one hand, and 
objective measures of behavior on 
the other. Furthermore, its opera- 
tional definition on the environ- 
mental side must be different from 
those defining other behavioral proc- 
esses which, for our present purposes, 
is the differentiation of set and one- 
factor forgetting variables. The in- 
dependence of defining operations is 
critical for testing a two-factor theory 
of forgetting even though forgetting 
and set exert highly similar 
effects on dependent response meas- 
ures. Just as long as one-factor for- 
getting is defined by the retroactive 
and proactive inhibition paradigm 
with experimental extinction as the 
process, and WU is defined by other 
operations related to a different proc- 
ess such as set, they both can be re- 
tained for the description of behavior 
because they can be independently 
manipulated and measured. This 
would be the justification for a scien- 
tifically sound two-factor theory of 
forgetting. Irion’s paper (1948) made 
the two-factor distinction for verbal 
learning and consequently has given 
a basis for the objective assessment 
of set as a determiner of behavior. 
The use of set with respect to motor 
behavior has not been grounded in 
definitions as clear as those for verbal 
behavior, but this will be discussed 
later. 

Irion’s conception of set has much 
in common with those of Bell (1942) 
and Ammons (1947a) for motor 
learning where set is considered to be 
an aggregate of postural and atten- 
tive adjustments which are posi- 
tively related to performance of the 
goal response. Complex perform- 


loss 
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ance, such as the learning of a verbal 
list, involves more than the external 
goal stimuli which the experimenter 
has objectively defined and controls, 
and to which the subject links the 
goal response measured by the experi- 
menter. In addition, various second- 
ary responses are learned, such as the 
orientation patterns for visual re- 
ceptors, proper postural attitudes, 
and muscular tensions. These re- 
sponses are secondary mainly in the 
sense of not being directly measured 
but the efficiency of the goal re- 
sponding is intimately linked to them. 
Irion hypothesizes that these sec- 
ondary responses, or set, are dis- 
turbed by the subject’s activities be- 
tween original learning and recall 
and this loss of set is the underlying 
cause for the steep slope of the initial 
segment of the relearning curve 
which is called WU. The disruption 
of set could operate to induce the dec- 
rement in retention in at least three 
ways: (a) failure of the receptors to 
adequately receive the goal stimuli, 
(b) mechanical inefficiency for opti- 
mum goal responding because the 
subject does not have the proper pos- 
ture or muscular tension patterns, 
and (c) change in the internal stimu- 
lation which is part of the stimulus 
complex to which goal responses are 
conditioned (Guthrie, 1952). This 
third possible cause of WU could be 
a function of one or both of the first 
two because if the secondary re- 
sponses are disturbed, then their pat- 
terns of response-produced stimula- 
tion change and the performance 
level of the goal responses condi- 
tioned to these internal cues is re- 
duced. 

While set is disrupted by activities 
during the rest interval, and thus is a 
kind of interference theory, the hy- 
pothesis is distinguished from the one- 
factor interference theory of forget- 
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ting by emphasizing the role of 
nongoal, secondary responses and, 
importantly, by specifying that these 
secondary responses are a function of 
operations different from those defin- 
ing the strength of goal responses. 
The interference theory is con- 
cerned with practice variables which 
strengthen goal S-R sequences and in- 
crease their resistance to forgetting, 
and interfering S-R sequences which 
weaken goal S-R sequences by ex- 
perimental extinction. Set, on the 
other hand, is strengthened by per- 
formance of S-R sequences that are 
neutral with respect to S-R goal se- 
quences and which overcome WU 
by restrengthening secondary re- 


sponses—not the strength of goal 
S-R sequences. While it is true that 
practice of goal responses appears to 
strengthen set, as the elimination of 
WU in relearning testifies, this is only 
because the goal responses are en- 
meshed in a matrix of secondary re- 


sponses and their practice is concur- 
rently accompanied by the practice 
of secondary responses. However, 
the task elements which define the 
learning problem for secondary re- 
sponses can be embodied in a sep- 
arate neutral task and can be used 
to strengthen set independently of 
goal response practicing. The weak- 
ening of set during the retention in- 
terval is also presumed to be by in- 
terfering activities neutral to goal re- 
sponses, but their characteristics are 
unspecified at this time. It might be 
presumed, for example, that general 
body movements would disrupt the 
particular postures and muscular ten- 
sion patterns acquired in the crite- 
rion task. 

Of course there may be nothing to 
the set hypothesis because all reten- 
tion loss could be one-factor forget- 
ting in terms of direct effects on goal 
responses. But even given the general 
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terms within which the hypothesis is 
stated, the scientific criteria are 
broadly met for verifying that a por- 
tion of the retention loss can be 
ascribed to something other than 
one-factor forgetting, and it should 
be possible to find neutral tasks 
whose performance in a retention 
interval would reinstate set and 
abolish WU but would not yield 
habit strength increments for goal 
responses. Furthermore, if set is a 
determiner of performance as Irion 
says, practice on a neutral task 
should enhance performance on a 
criterion task before original learn- 
ing by strengthening advantageous 
secondary responses. Effects on re- 
call and original learning will be 
treated separately in the sections 
that follow. 


Verbal Behavior 


Recall. Taking cue from Ward's 
experiment (Ward, 1937) where the 


subject’s association of colors during 


rest benefited verbal recall, Irion 
(1949b) tested the set hypothesis by 
using one trial of a neutral color-nam- 
ing task as a warming-up activity 
just before the recall of paired adjec- 
tives after 24 hours. The subject was 
not required to memorize colors but 
only to name them as they appeared 
in the window of a memory drum. 
Color-naming, then, did not in any 
way involve practice of goal re- 
sponses but did serve to reorient the 
subject to the rhythm of responding 
and direct his visual attending, pos- 
ture, and physical adjustments in a 
manner very similar to that required 
in the criterion task and should func- 
tion to restore the subject’s set to 
respond. Irion found that a rest con- 
trol group which had conventional 
recall after the 24-hour interval dis- 
played a significant performance 
loss, but the color-naming group on 
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the first recall trial was significantly 
superior to the rest control group, 
and not different from a no-rest con- 
trol group. This is in good accord 
with the set hypothesis. In re-estab- 
lishing performance in the relearning 
trials, Irion found the one trial of 
color-naming essentially equivalent 
to one trial of practice on the crite- 
rion task. A related study reported in 
the same paper demonstrated that 
first trial recall was a decreasing func- 
tion of the length of the rest interval 
up to 24 hours and that the slopes 
of the relearning curves were a func- 
tion of the length of the rest interval. 
Irion interpreted this experiment as 
being in accord with the set hy- 
pothesis and the definition of WU in 
terms of the slope of the postrest 
performance curve. Since his color- 
naming experiment demonstrated 
that retention loss occurring after 24 
hours could be eliminated by one 
trial of warming-up activity, it seems 
safe to assume that this decreasing 
recall function shows increasing WU 
and loss of set over interpolated time. 

Irion and Wham (1951) tested an 
implication of the set hypothesis that 
WU should be a decreasing function 
of the amount of set-reinstating ac- 
tivity. The criterion task was serial 
rote learning of nonsense syllables 
and the warming-up activity was rec- 
itation of three-place numbers. The 
retention interval was 35 minutes. 
Warming-up had a significant effect 
on the first recall trial, with perform- 
ance level being a positive function 
of the number-naming trials. And, 
rate of increase of the initial WU seg- 
ment of the relearning curves tended 
to be inversely related to the amount 
of warming-up. This study extends 
Irion’s earlier work and represents 
good support for the set version of 
WU. 

One interpretation that could be 
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given the positive effects of warm- 
ing-up activities on verbal recall is 
that warming-up actually amounts to 
the strengthening of generalized tech- 
niques or modes of attack (learning- 
how-to learn) and might be expected 
to appear as a higher level of goal re- 
sponding in the initial recall trials. 
According to the interference theory 
of forgetting we would expect these 
general responses to be relatively 
independent of interpolated time be- 
cause of the low probability of inter- 
fering responses occurring in casual 
experience. Set might be considered 
a more labile phenomenon, particu- 
larly if it is associated with muscular 
and postural patterns which could 
be disturbed by ordinary bodily 
movements that occur plentifully 
even during a relatively brief period 
of unstructured rest. Hartley (1948) 
considered this problem with an ex- 
periment which manipulated the 


temporal point of the warming-up 


activity. Using paired-associate ad- 
jectives as the learning task and 
color-naming as the warming-up ac- 
tivity, Hartley hypothesized that 
the set hypothesis would require 
color-naming to be effective in dissi- 
pating WU after a 24-hour rest only 
if given just prior to recall. But, if 
generalized habits were learned, then 
the expectation would be that color- 
naming immediately after original 
learning would beneficially influence 
recall. Hartley’s data confirmed the 
set hypothesis. The group which had 
color-naming just prior to recall had 
the same performance level as a no- 
rest control group, but the group 
which had color-naming immediately 
after original learning had the same 
level of recall as a control group 
which had simple rest for 24 hours. 
Hunter (1955) demonstrated a sim- 
ilar effect of the time interval be- 
tween warming-up activities and re- 
call. 
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Original learning. If set is a general 
determiner of verbal behavior it is a 
reasonable expectation that warm- 
ing-up activities just prior to original 
learning should increase performance 
level just as it does for recall. Heron 
(1928) performed the first experi- 
ment along these lines. Each subject 
learned two different lists of non- 
sense syllables a day on each of three 
different days. Heron found that the 
second list of each day was learned in 
fewer trials than the first list and he 
interpreted the positive transfer from 
the first to the second list as a tem- 
porary warming-up effect which dissi- 
pated over the time period between 
practice sessions. Thune (1951) re- 
fined the earlier work of Heron. 
Three different lists of paired adjec- 
tives were administered on each of 
five different days. While there was a 
slow learning-how-to learn effect as 
evidenced by the gradual upward 
trend of overall performance, the 
within-session gains from the first to 
third list each day were dramatic. 
Moreover, these gains largely disap- 
peared during the rest interval be- 
tween sessions. As Heron, Thune in- 
terpreted the gains within a session 
as a warming-up effect, with practice 
of each list exercising a facilitative 
effect on the learning of subsequent 
lists during a session because of set 
being re-established. Being labile, set 
is lost between sessions and thus 
there is relatively poor performance 
again at the beginning of each session. 

Another experiment by Thune 
(1950) showed the positive effects of 
two kinds of warming-up activity on 
the original learning of a criterion 
list of paired nouns. Using a list of 
paired adjectives as the neutral 
warming-up activity, Thune held the 
total amount of practice on it con- 
stant but gave part of the practice on 
the first day and part of it on the 
second day just prior to learning the 
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criterion list. The experimental vari- 
able was the proportion of the total 
practice for the warming-up list given 
on the second day. Positive transfer 
to the criterion list was found to be 
an increasing function of this propor- 
tion, and the benefits were located in 
the early trials where the primary 
growth of set presumably takes place. 
In another experiment reported in the 
same paper, Thune found that 10 
trials of guessing the color that would 
appear next under the shutter of the 
memory drum had about the same 
positive transfer effects as 10 trials of 
the neutral list of nouns if both were 
presented just before practice of the 
criterion list. By using appropriate 
control groups, Thune further showed 
that 10 trials of color-guessing ad- 
ministered 24 hours before the test 
list had no effect and he was drawn to 
the same conclusions as Hartley 
(1948) that the temporal point of 
warming-up is critical, and neutral 
warming-up activities function to 
strengthen labile set factors rather 
than stable habits. All of these find- 
ings of Thune’s are in agreement with 
the set hypothesis. Hamilton (1950) 
performed an experiment closely co- 
ordinated with the Thune investiga- 
tion in which the time interval be- 
tween the warming-up list and the 
test list was varied. Performance on 
the criterion list was an exponential 
decay function of the retention inter- 
val and was interpreted as the curve 
for the loss of unstable set factors. 


Motor Behavior 


There is no intrinsic reason why 
the set hypothesis should not apply 
to both motor behavior and verbal 
behavior and, indeed, this would be a 
desirable generality. WU-like charac- 
teristics are commonly found in post- 
rest motor performance curves and 
set is a promising candidate for ex- 
planation, but the problem for motor 
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behavior lies in differentiating set and 
one-factor forgetting as determiners 
of the WU segment ot the curve. The 
means of doing this for verbal learn- 
ing and recall mainly was to show 
positive effects of goal behavior from 
performance on neutral tasks that 
could not reasonably be thought as 
strengthening goal responses and 
therefore must be enhancing some- 
thing else—set. The difficulty for 
motor behavior at this time is that 
these neutral tasks have not yet been 
discovered, if they exist at all, and so 
the WU-like effects can just as well 
be explained by a one-factor forget- 
ting theory. Certainly this difficulty 
does not negate the set hypothesis, 
and actually it may only be a tem- 
porary impasse, but it does urge re- 
straint upon us. 

Ammons’ research on WU  illus- 
trates the problem (Ammons, 1947a, 
1947b, 1950). The basis of his work is 
a miniature theoretical system for 
rotary pursuit performance (Am- 
mons, 1947a) which leans heavily on 
Hull’s theoretical system for explana- 
tory constructs (Hull, 1943) but has 
the added feature of set as a construct 
term to explain the WU segment. 
Ammons specifies set and WU as 
inversely related, and relates them to 
the independent variables of number 
of prerest practice sessions, total 
amount of prior practice, duration of 
practice sessions, duration of inter- 
polated rest, and the amount of prac- 
tice that has occurred in the present, 
ongoing practice period. In subse- 
quent papers on empirical research 
(Ammons 1947b, 1950) Ammons re- 
ports a number of relationships be- 
tween WU and these independent 
variables but, since the one-factor in- 
terference theory of forgetting would 
also relate retention to rest and to 
practice variables defining habit 
strength in the task, there is no foun- 
dation for the assumption that these 
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independent variables are manipulat- 
ing WU through changes in set. 
Irion (1949a) performed similar re- 
search with the Rotary Pursuit Test 
and related WU to the amount of pre- 
rest practice and the duration of rest. 
His paper reflects the same method- 
ological problem. 

Efforts to locate a neutral task 
which would influence WU in the re- 
call of motor performance, as well as 
the level of original learning, have 
met with failure. The most thorough 
experimental attempt was by Am- 
mons (1951) using the Rotary Pur- 
suit Test. His subjects were admin- 
istered initial practice, rest, and a 
postrest practice session. Set-rein- 
stating activities were either watch- 
ing the disk, blindfolded manual 
performance of the rotary motion by 
holding a small rivet set in the rotor 
plate, or imaginary practice where 
the subject was merely to think 
about practicing. These activities 
were administered either before ini- 
tial practice, before the postrest 
practice session, or before both prac- 
tice periods. No effects were observed 
from any of the experimental treat- 
ments. Walker, DeSoto, and Shelly 
(1957) performed a bilateral transfer 
experiment on the Rotary Pursuit 
Test. Original practice was with one 
hand and, following rest, practice 
was resumed with the other hand. 
One of the experimental conditions 
was to have one trial of practice just 
before the postrest session with the 
prerest practice hand to see if it could 
have a warming-up effect on the 
transfer hand. WU was found in per- 
formance on the transfer hand but it 
was unaffected by the warming-up 
procedure and they concluded that 
WU must be quite specific to an effec- 
tor. Hamilton and Mola (1953) used 
a finger maze and evaluated the effect 
of practice on five different mazes on 
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performance in a criterion maze. 
They used an experimental design 
similar to Hartley (1948) and Thune 
(1950) and gave practice on the five 
warming-up mazes either 24 hours or 
immediately preceding the test maze. 
Positive transfer to the criterion 
maze was found but it was about the 
same for both warm-up groups and 
the authors concluded that practice 
on the five mazes exerted only a gen- 
eral practice effect and had no warm- 
ing-up properties. A small encour- 
aging sign to counter this negative 
evidence is found in a study by 
Adams (1955) on sources of work in- 
hibition in complex motor perform- 
ance. The Rotary Pursuit Test was 
used and during an _ intersession 
period one group was required to 
observe a partner’s performance and 
press a button every time he judged 
him to be on target. This activity 
produced work inhibition but it also 
tended to result in less WU than 
found for control groups. The re- 
duced WU was a secondary finding 
in a study on another topic but it isa 
lead on a likely set-reinstating activ- 
ity for motor performance. 


Negative Evidence 


Ordinarily the amount of evidence 
which has been cited in support of the 
set hypothesis for verbal learning 
would be sufficient to give a good 
measure of security to the two-factor 
theory in psychology, but unfortu- 
nately there is a disconcerting number 


of negative findings. In a careful 
effort to replicate Irion’s verbal 
learning study (1949b), Rockway and 
Duncan (1952) were unable to re- 
produce Irion’s results and show 
an effect of colar-naming on re- 
call. Similarly, Withey, Buxton, and 
Elkin (1949) and Hovland and Kurtz 
(1951) failed to show an influence of 
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color-naming on verbal recall. Un- 
derwood (1952), studying the serial 
learning of nonsense syllables, was 
unable to demonstrate that the 
warming-up activity of number-nam- 
ing prior to recall produced an effect 
on WU after 24 hours of rest. Under- 
wood said that this did not necesarily 
contradict previous findings by Irion 
and others because earlier studies did 
not use the same subjects in several 
experimental conditions as he had 
done. Dinner and Duncan (1959) 
hypothesized that the unreliability of 
the effects of color-naming on verbal 
recall might be a function of degree of 
original learning. Using a low, me- 
dium, and high degree of original 
learning of paired adjectives, they 
found that color-naming influenced 
recall only when level of original 
learning was high. They concluded 
that Irion’s positive use of color- 
naming (Irion, 1949b) based on a 
medium degree of original learning 


should be considered sampling error 
and discounted. However, this judg- 
ment should be regarded with caution 
because it does not consider the works 


of Hartley (1948) and Irion and 
Wham (1951) who all obtained posi- 
tive effects of warming-up activities 
on verbal recall when low or inter- 
mediate levels of original learning 
were used. The Dinner and Duncan 
investigation makes an original con- 
tribution in showing an efiect of the 
amount of original learning, but it 
cannot be reconciled at this time with 
verbal learning studies which effec- 
tively used warming-up activities to 
enhance recall. 

Another disturbing consideration 
for understanding WU and the set 
hypothesis is that there are tasks 
where performance in the initial post- 
rest trials does not show WU. The 
inverted alphabet printing task has 
enjoyed moderate popularity for the 
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study of work inhibition and the 
data never show WU (Archer, 1954; 
Eysenck, 1956; Kimble, 1949; Wasser- 
man, 1951). Silver (1952) used the 
inverted alphabet printing task to 
investigate the interaction of warm- 
up activity on WU and work inhibi- 
tion, but since he used performance 
on the first postrest trial for his com- 
parisons there is no reason to believe 
that he was manipulating WU or, 
indeed whether his data even dis- 
played WU. Other investigators using 
inverted alphabet printing failed to 
show WU, and it is unlikely that 
WU as it has been defined in terms of 
a rapid increase in performance on 
the initial postrest trials was even 
present in Silver’s data. Bilodeau 
(1952a, 1952b), investigated work in- 
hibition by manipulating the physical 
load required to turn a manual crank, 
and found no WU in his data. Doten 
(1955), in a study of interference, 
used a task where the subject was 
presented the printed names of 
colors, but where the color of the let- 
ters in the name was different than 
indicated by the name. The word 
“‘Red”’ might be printed in blue, for 
example. The task of the subject in 
original learning was to respond by 
stating the actual color of the letter- 
ing, and no WU was _ indicated. 
The initial segments of performance 
curves on each day had an immedi- 
ate decrease in speed of responding, 
not the rapid increase which is char- 
acteristic of WU segments. Tasks 
such as these raise serious definitional 
problems for set, or any other WU 
hypothesis for that matter. We can- 
not expect any explanatory hypothe- 
sis to enjoy a good degree of success 
until the tasks in which WU occurs 
have been established. Ideally the 
set hypothesis should contain state- 
ments relating WU and task charac- 
teristics. 
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Inhibition 

The principal advocate of an inhi- 
bition hypothesis is Eysenck (1956) 
who interprets WU as mainly being 
attributable to the extinction of 
Hull’s conditioned inhibition (Hull, 
1943). Eysenck does not completely 
deny the set hypothesis, but rather 
considers loss of set a lesser contribu- 
tor to WU, with the extinction of con- 
ditioned inhibition being the primary 
reason for the trend of the initial 
postrest segment. It will be recalled 
that Hull has a two-factor theory of 
inhibition. One construct, Jp, is an 
increasing function of the number of 
responses and amount of physical 
work, and a decreasing function of 
the rest interval. Moreover, Jpg has 
drive properties and its dissipation is 
regarded as drive reduction. Since 
drive reduction in Hull’s system is 
the basis of reinforcement, an incre- 
ment of habit strength for the on- 
going response is accrued whenever 
Tz dissipates. Because the subject is 
resting when Ig is dissipating, it is 
theorized that a resting response is 
strengthened which is antagonistic 
to the goal response. The habit con- 
struct for the resting response is the 
second inhibitory factor, slr. These 
two types of inhibition summate and 
subtract from the excitatory poten- 
tial (sEr) for the goal response to 
yield effective excitatory potential 
(sEr) which is the primary deter- 
miner of overt performance level. 
The massed group in Figure 1 can be 
used to illustrate Eysenck’s applica- 
tion of Hull’s inhibition theory to 
WU. In the first session the subject 
responds continuously and Ip ac- 
crues. Then, over rest, Ip dissipates 
and an increment of slr develops. 
The failure of performance on the 
first postrest trial to reminisce to the 
level of the distributed group is taken 
as evidence for the presence of s/pr. 
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When the subject begins practice in 
the second session the goal response 
is now being reinforced and the non- 
reinforced resting response under- 
goes experimental extinction. This 
period of extinction of the resting re- 
sponse is revealed as the WU seg- 
ment, according to Eysenck. One im- 
mediate prediction from Eysenck’s 
hypothesis is that litthe WU should 
be found under well-spaced practice 
conditions where a negligible amount 
of Ip is generated on each trial, and 
thus negligible sJr. Eysenck tested 
this deduction using the Rotary Pur- 
suit Test and, in accord with his pre- 
diction, found WU under conditions 
of massed practice but not distrib- 
uted practice. His findings and con- 
clusions are tenuous however, be- 
cause of the instances in the experi- 
mental literature showing WU under 
conditions of widely distributed prac- 
tice on the Rotary Pursuit Test. Fig- 
ure 1 is a good example (Adams, 
1952b). Other examples are Am- 
mons (1950), Denny, Frisbey, and 
Weaver (1955), Digman (1959), 
Kimble and Shatel (1952), and 
Jahnke and Duncan (1956). There is 
no immediate explanation for Ey- 
senck’s unusual finding, but WU un- 
der conditions of well-distributed 
practice is a commonplace finding 
and suggests that Eysenck’s hy- 
pothesis cannot be taken seriously. 
Adams (1952b) entertained a dif- 
ferent inhibition hypothesis. Ob- 
serving that much of the evidence for 
WU came from studies on the Rotary 
Pursuit Test under conditions of 
massed practice, he deduced the 
characteristics of a postrest perform- 
ance curve from a negatively acceler- 
ated growth of reaction potential 
with trials and an ogival function for 
the accrual of work inhibition when 
practice is massed. It was predicted 
that WU should not appear under 
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conditions of distributed practice. 
The occurrence of clearcut WU when 
training was widely distributed (Fig- 
ure 1) led to rejection of the hypothe- 
sis. 

At present it must be concluded 
that no convincing evidence exists in 
support ofan inhibition explanation 


of WU. 
WarRM-UP IN ANIMALS 


The main concern over WU has 
been with human subjects, but it is 
noteworthy that the phenomenon 
also has been observed in animals. 
The following studies are not meant 
to represent an exhaustive search of 
the literature on animal behavior, 
but rather are intended to show the 
ubiquity of WU-like effects and that 
its characteristics are not found only 
in human response records. Schlos- 
berg (1934, 1936) interprets as WU 
the failure of occurrence of a well- 
learned conditioned response in the 
white rat on the first few trials of a 
learning session. Ellson (1938), in 
studying extinction of a bar-pressing 
habit in the rat, found rate of re- 
sponse slower in the first fifth of the 
extinction trials than in the second 
fifth. He interpreted this as WU and 
explained it in terms of Guthrie's 
theory which holds that the stimuli 
to which a response is learned include 
internal stimuli resulting from pos- 
ture, movement, etc. Later responses 
are partly conditioned to the re- 
sponse-produced stimuli of earlier re- 
sponses and we would expect that 
later responses in a series would have 
greater strength because of the pres- 
ence of the response-produced stimuli 
to which they are conditioned. In the 
latter part of extinction the effects of 
nonreward overcome this trend and 
the performance level then decreases 
systematically. Finger (1942) used 
rats in a straight runway situation 
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and found WU revealed when an ex- 
tinction series was administered after 
24 hours. The second extinction trial 
actually had performance superior to 
that of the first extinction trial—a 
finding contrary to expectation for an 
extinction series. Finger’s finding for 
extinction is quite similar to Ellson’s. 
Verplanck (1942) reported WU for 
rats in a simple running task. Like 
many investigators of human _ be- 
animal researchers 
freely labeled decrements in initial 
postrest performance 


havior, these 


segments of 


records as WU although the decre- 
ments could just as well, and more 
economically, have been explained by 
the one-factor forgetting hypothesis. 


DISCUSSIONS AND CONCLUSIONS 


Virtually all support for the two- 
factor theory of forgetting is em- 
bodied in experiments which have 
demonstrated that WU is reduced or 
eliminated by repetition of responses 
that orient the subject to the general 
task demands (e.g., color-naming) 
but which do not involve direct prac- 
tice of goal responses. By themselves 
these experiments might be sufficient 
to establish set as a second factor nec- 
essary for the explanation of reten- 
tion loss, but the studies where set- 
reinstating activities have failed to 
influence recall in both motor and 
verbal tasks, and the tasks where no 
WU whatsoever has been found, 
leave the second factor in doubt. 
There is not sufficient evidence to re- 
ject the set hypothesis but neither 
are there grounds for firmly retaining 
it. Certainly it is the most tenable of 
all hypotheses advanced, but a great 
deal of careful research seems re- 
quired before the set hypothesis, and 
thus the two-factor theory of forget- 
ting, can be accepted or rejected with 
confidence. 

It is unlikely that a decision ever 
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will be made about the set hypothe- 
sis unless it receives a more thorough 
testing than it has in the past. The 
set-reinstating experiments are very 
broadly derived from the hypothesis 
and have not been a test of the more 
explicit implications of set. By view- 
ing the set hypothesis in its more de- 
tailed aspects, and in attempting to 
develop specific experiments and 
measures to empirically verify these 
details, it should be possible not only 
to clarify the status of the set hy- 
pothesis but also to determine why, 
for example, some tasks display 
WU and others do not. For example, 
Irion (1948), Ammons (1947a), and 
Bell (1942) all contend that the 
acquisition of set can be the learning 
of beneficial postures and muscular 
tensions that facilitate the occurrence 
of goal response sequences. Rest pe- 
riod activities disturb the favorable 
set and the WU segment of a post- 
rest performance curve represents the 
reacquisition of these favorable bod- 
ily attitudes. If there is anything to 
this version of the set hypothesis, it 
would seem fruitful to explore the 
characteristics of bodily tensions by 
direct measurement and then relate 
it to changes in performance of the 
goal response. Davis and his associ- 
ates have performed a number of 
studies (e.g., Davis, 1940, 1956) 
showing the relationship between the 
characteristics of overt responding 
and muscular tensions as revealed 
by electromyographic measurement 
techniques. Davis (1956) does not 
believe that the muscular substrata 
and the overt goal responses need be 
conceptualized as fundamentally dif- 
ferent. A state of tension in skeletal 
muscle is the same as any other 
muscular contraction, i.e., it is a re- 
sponse configuration. Davis (1956) 
says: 


Muscular tensions would then be themselves 
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responses to stimuli, many being small re- 
sponses, detectable only with instruments, but 
with no firm boundary between them and the 
larger muscular activities associated with 
movement (p. 2). 


Davis’ work is suggestive for the 
set hypothesis because it strongly 
hints that the pattern of electromyo- 
graphic measures of muscular tension 
during the WU segment of the post- 
rest performance curve would have 
levels and patterns of muscular ten- 
sion different from final prerest per- 
formance, and these levels and pat- 
terns will have shifted in the direction 
associated with poorer performance. 
Moreover, the reacquisition of prerest 
values and patterns of muscular ten- 
sions should parallel the trend of the 
WU segment. Furthermore, and im- 
portantly, it suggests that neutral 
set-reinstating activities will pro- 
duce electromyographic changes sig- 
nifying that the favorable muscular 
tensions existing at final prerest per- 
formance are being re-established. 

There are difficulties in opera- 
tionally distinguishing between an 
electromyographically-verified muscle 
tension version of the set hypothesis 
and Irion’s alternative Guthrian hy- 
pothesis that loss of set is disturbance 
of internal stimuli to which goal re- 
sponses are partly conditioned. If 
we have changes in the muscle ten- 
sion secondary responses and this in 
turn, results in a lower level for goal 
responses, we cannot be sure that the 
lower level is due to quasi-mechanical 
considerations where muscular ten- 
sions underlie useful postures and 
bodily attitudes, or whether it is due 
to changes in the population of stim- 
uli to which the goal responses have 
been conditioned. Despite the po- 
tential difficulties of interpreting the 
primary effects of muscle tension sec- 
ondary responses, on performance of 
the goal responses, it would be a fun- 
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damental finding to show a syste- 
matic covariation of electromyo- 
graphic measures and WU phenom- 
ena. The evanescent quality of set 
could benefit from a diversity of ap- 
proaches at this time to provide 
clues for a reconciliation of inconsist- 
encies among the various experi- 
mental findings. 

The delineation of set and its role 
in retention will sharpen our under- 
standing of the retention loss problem 
and will improve our efforts to predict 
and control it. Underwood (1957) 
has shown that our frequent use of 
the same subjects in several labora- 
tory experiments has led us to greatly 
overestimate the retention loss for 
verbal responses because the experi- 
menter was unwittingly contaminat- 
ing his retention scores with proac- 
tive inhibition effects. But even 
given this downward revision of re- 
tention loss, we are still faced with 
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showing the proportion of it attribut- 
able to interference with goal re- 
sponses and the part assignable to 
loss of set. Irion’s experiment 
(1949b), for example, showed that 
one trial of color-naming almost com- 
pletely eliminated the verbal reten- 
tion loss and therefore all of the loss 
could be described in terms of change 
in set. This suggests that if the two- 
factor theory eventually becomes 
better established in fact the para- 
digm of retention studies will have 
to include groups whose performance 
of set-reinstating activities will allow 
a parsing of set and interference com- 
ponents. Interference with goal re- 
sponses may be a smaller contributor 
to retention loss than we now sur- 
mise. The first research need how- 


ever, is a more incisive laboratory at- 
tack on the validity of set and its 
underlying nature. 


REFERENCES 


Apams, J. A. The influence of the time inter- 
val after interpolated activity on psycho- 
motor performance. USAF Hum. Resour. 
Res. Cent. res. Bull., 1952, No. 52-11, (a) 

Apams, J. A. Warm-up decrement in per- 
formance on the pursuit-rotor. Amer. J. 
Psychol., 1952, 65, 404-414. (b) 

Apams, J. A. A source of decrement in psy- 
chomotor performance. J. exp. Psychol., 
1955, 49, 390-394. 

Ammons, R. B. Acquisition of motor skill: 
I. Quantitative analysis and theoretical 
formulation. Psychol. Rev., 1947, 54, 263- 
281. (a) 

Ammons, R. B. Acquisition of motor skill: 
II. Rotary pursuit performance with con- 
tinuous practice before and after a single 
rest. J. exp. Psychol., 1947, 37, 393-411. 
(b) 

Ammons, R. B. Acquisition of motor skill: 
III. Effects of initially distributed practice 
on rotary pursuit performance. J. exp. 
Psychol., 1950, 40, 777-787. 

Ammons, R. B. Effects of prepractice ac- 
tivities on rotary pursuit performance. J. 
exp. Psychol., 1951, 41, 187-191. 

Aral, T. Mental fatigue. Teach. Coll. Contri. 
Educ., 1912, No. 54. 


ArcHER, J. E. Postrest performance in motor 
learning as a function of prerest degree of 
distribution of practice. J. exp. Psychol., 
1954, 47, 47-S1. 

Barca, A. M. Warm-up in massed and dis- 
tributed pursuit rotor performance. J. 
exp. Psychol., 1954, 47, 357-361. 

BELL, H. M. Rest pauses in motor learning as 
related to Snoddy’s hypothesis of mental 
growth. Psychol. Monogr., 1942, 5$4(1, 
Whole No. 243). 

BitopeEau, E. A. Decrements and recovery 
from decrements in a simple work task with 
variation in force requirements at dif- 
ferent stages of practice. J. exp. Psychol., 
1952, 44, 96-100. (a) 

Brtopeau, E. A. Massing and spacing phe- 
nomena as a function of prolonged and ex- 
tended practice. J. exp. Psychol., 1952, 44, 
108-113. (b) 

Briccs, G. E. Acquisition, extinction, and 
recovery function in retroactive inhibition. 
J. exp. Psychol., 1954, 47, 285-293. 

Briccs, G. E. Retroactive inhibition as a 
function of the degree of original and inter- 
polated icarning. J. exp. Psychol., 1957, 
53, 60-67. 

BuGeE tsk, B. R., & CADWALLADER, T. C. A 





272 


reappraisal of the transfer and retroaction 
surface. J. exp. Psychol., 1956, 52, 360- 
366. 

Davis, R. C. Set and muscular tension. 
Indiana U. Publ., Sci. Ser., 1940, No. 10. 
Davis, R. C. Electromyographic factors in 
aircraft control: The relation of muscular 
tension to performance. USAF Sch. Aviat. 

Med. Rep., 1956, No. 55-122. 

Denny, R. M., Frispey, N., & WEAVER, J., 
Jr. Rotary pursuit performance under al- 
ternate conditions of distributed and 
massed practice. J. exp. Psychol., 1955, 49, 
48-54. 

Dicman, J. M. Growth of a motor skill as a 
function of distribution of practice. J. exp. 
Psychol., 1959, 57, 310-316. 

DINNER, JuDITH E., & Duncan, C. P. Warm- 
up in retention as a function of degree of 
verbal learning. J. exp. Psychol., 1959, 57, 
257-261. 

Doré, L. R., & HmGarp, E. R. Spaced prac- 
tice as a test of Snoddy’s two processes in 
mental growth. J. exp. Psychol., 1938, 23, 
359-374. 

Doten, G. W. The effects of rest periods on 
interference of a well-established habit. J. 
exp. Psychol., 1955, 49, 401-406. 

E.ttson, D. G. Quantitative studies of the 
interaction of simple habits: I. Recovery 
from specific and generalized effects of ex- 
tinction. J. exp. Psychol., 1938, 23, 339 
358. 

Eysenck, H. J. “Warm-up” in pursuit rotor 
learning as a function of the extinction of 
conditioned inhibition. Acta psychol., 
Amst., 1956, 12, 349-370. 

Fincer, F. W. Retention and subsequent ex- 
tinction of a simple running response follow- 
ing varying conditions of reinforcement. 
J. exp. Psychol., 1942, 31, 120-133. 

Gipson, J. J. A critical review of the con- 
cept of set in contemporary experimental 
psychology. Psychol. Bull., 1941, 38, 781- 
817. 

GuTHRIE, E. R. The psychology of learning. 
(Rev. ed.) New York: Harper, 1952. 

HamILton, C. E. The relationship between 
length of interval separating two learning 
tasks and performance on the second task. 
J. exp. Psychol., 1950, 40, 613-621. 

HaAmMILTon, C. E., & Mora, W. R. Warm-up 
effect in human maze learning. J. exp. 
Psychol., 1953, 45, 437-441. 

HARTLEY, T. C. Retention as a function of 
the temporal position of an interpolated 
warming-up task. Unpublished MA thesis, 
University of Illinois, 1948. 

Heron, W. T. The warming-up effect in 


JAHNKE, J. C., & Duncan, C. P. 


JACK A. ADAMS 


learning nonsense syllables. 
chol., 1928, 35, 219-228. 

HILGArRD, E. R., & Smitu, M. B. Distributed 
practice in motor learning: Score changes 
within and between daily sessions. J. exp. 
Psychol., 1942, 30, 136-146. 

HovianD, C. I., & Kurtz, K. H. Experi- 
mental studies in rote-learning theory: IX. 
Influence of work-decrement factors on 
verbal learning. J. exp. Psychol., 1951, 42, 
265-272. 

Hutt, C. L. Principles of behavior. 
York: Appleton-Century, 1943. 

Hunter, I. A. The warming-up effect in re- 
call performance. Quart. J. exp. Psychol., 
1955, 7, 166-175. 

Ir1onN, A. L. The relation of “set’’ to reten- 
tion. Psychol. Rev., 1948, 55, 336-341. 

Irt0N, A. L. Reminiscence in pursuit-rotor 
learning as a function of length of rest and 
of amount of pre-rest practice. J. exp. 
Psychol., 1949, 39, 492-499. (a) 

IrtonN, A. L. Retention and warming-up 
effects in paired associate learning. J. exp. 
Psychol., 1949, 39, 669-675. (b) 

IrRton, A. L., & Wuam, Dorotny S. Re- 
covery from retention loss as a function of 
amount of pre-recall warming-up. J. exp. 
Psychol., 1951, 41, 242-246. 


J. genet. Psy- 


New 


Reminis- 
cence and forgetting in motor learning after 
extended rest intervals. J. exp. Psychol., 
1956, 52, 273-282. 

KIMBLE, G. A. An experimental test of two- 
factor theory of inhibition. J. exp. Psy- 
chol., 1949, 39, 15-23. 

Kise, G. A., & SHATEL, R. B. The rela- 
tionship between two kinds of inhibition 
and the amount of practice. J. exp. Psy- 
chol., 1952, 44, 355-359. 

Lewis, D., & McALuisteER, Dorotuy E. An 
investigation of individual susceptibility 
to interference. USN Spec. Dev. Cent. tech. 
Rep., 1950, No. 938-1-10. 

Lewis, D., McA.uister, Dorotuy E., & 
Apams, J. A. Facilitation and _ inter- 
ference in performance on the modified 
Mashburn apparatus: I. The effects of 
varying the amount of original learning. J. 
exp. Psychol., 1951, 41, 247-260. 

Lewis, D., SHEPHARD, A. H., & Apams, J. A. 
Evidences of associative interferences in 
psychomotor performance. Science, 1949, 
110, 271-273. 

Lewis, D., Smita, P. N., & McALLISTER, 
Dorotuy E. Retroactive facilitation and 
interference in performance on the Modi- 
fied Two-Hand Coordinator. J. exp. Psy- 
chol., 1952, 44, 44-50. 





WARM-UP DECREMENT 


MELTon, A. W. (Ed.) Apparatus tests. (AAF 
Aviat. Psychol. Program res. Rep. No. 4) 
Washington, D. C.: United States Govern- 
ment Printing Office, 1947. 

Mosso, A. Fatigue. (Trans. by M. Drum- 
mond) New York: Putnam, 1906. 

Oscoop, C. E. The similarity paradox in hu- 
man learning. Psychol. Rev., 1949, 56, 
132-143. 

Oscoop, C. E. 
mental psychology. 
1953. 

REYNOLDs, B., & ApAms, J. A. Psychomotor 
performance as a function of initial level of 
ability. Amer. J. Psychol., 1954, 67, 268- 
277. 

Ropinson, E. S. Work of the integrated or- 
ganism. In C. Murchison (Ed.), Handbook 
of general experimental psychology, 1934. 

Rosinson, E. S., & HERon, W. T. The warm- 
ing-up effect. J. exp. Psychol., 1924, 7, 
81-97. 

Rockway, M. R., & Duncan, C. P. Pre- 
recall warming-up in verbal retention. J. 
exp. Psychol., 1952, 43, 305-312. 

SCHLOSBERG, H. Conditioned responses in the 
white rat. J. genet. Psychol., 1934, 45, 303- 
335. 

SCHLOSBERG, H. Conditioned responses in the 
white rat: Il. Conditioned responses based 
upon shock to the foreleg. J. genet. Psychol., 
1936, 49, 107-138. 

SHEPHARD, A. H. 
ing the standard Mashburn task arising 
from different levels of learning on the re- 
versed task. USN Spec. Dev. Cent. tech. 
Rep., 1950, No. 938-1-9. 

SHEPHARD, A. H., & Lewis, D. Prior learning 
as a factor in shaping performance curves. 
USN Spec. Dev. Cent. tech. Rep., 1950, No 
938-14. 

Sitver, R. J. Effect of amount and distribu- 
tion of warming-up activity on retention in 
motor learning. J. exp. Psychol., 1952, 44, 
88-95. 

Snoppy, G. S. 
processes in 
Science, 1935. 

STEINBERG, HANNAH, & SUMMERFIELD, A. In- 
fluence of a depressant drug on acquisition 
in rote learning. Quart. J. exp. Psychol., 
1957, 9, 138-145. 

SUMMERFIELD, A., & STEINBERG, HANNAH. 
Reducing interference in forgetting. Quart. 
J. exp. Psychol., 1957, 9, 146-154. 

SUMMERFIELD, A., & STEINBERG, HANNAH. 


Method and theory in experi- 
New York: Oxford, 


Losses of skill in perform- 


Evidence for two opposed 
mental growth. Lancaster: 


273 


Using drugs to alter memory experimentally 
in man. In P. B. Bradley, P. Deniker, & 
C. Radouco-Thomas (Eds.), Neuro-psycho- 
pharmacology. Houston: Elsevier, 1959. 
Pp. 481-483. 

THORNDIKE, E. L. Educational psychology. 
Vol. 3. New York: Teachers Coll., Colum 
bia Univer., 1914. 

Tuune, L. E. The effects of different types of 
preliminary activities on subsequent learn- 
ing of paired-associate learning. J. exp. 
Psychol., 1950, 40, 423-438. 

Tuune, L. E. Warm-up effect as a function 
of level of practice in verbal learning. J. 
exp. Psychol., 1951, 42, 250-256. 

INDERWOOD, B. J. Retroactive and pro- 
active inhibition after five and forty-eight 
hours. J. exp. Psychol., 1948, 38, 29-38. (a) 

JNDERWOOD, B. J. ‘Spontaneous recovery” 
of verbal associations. J. exp. Psychol., 
1948, 38, 429-439. (b) 

INDERWOOD, B. J. Studies of distributed 
practice: VI. The influence of rest-interval 
activity in serial learning. J. exp. Psychol., 
1952, 43, 329-340. 

JNDERWOOD, B. J. Interference and forget- 
ting. Psychol. Rev., 1957, 64, 49-60. 

JNDERWOOD, B. J., & PostMan, L. Extra- 
experimental sources of interference in 
forgetting. Psychol. Rev., 1960, 67, 73-95. 

VERPLANCK, W. S. The development of dis- 
crimination in a simple locomotor habit. J. 
exp. Psychol., 1942, 31, 441-464. 

WALKER, L. C., DeSoto, C. B., & SHELLY, 
M. W. Rest and warm-up in bilateral trans- 
fer on a pursuit rotor task. J. exp. Psychol., 
1957, 53, 394-404. 

Warp, L. B. Reminiscence and rote learning. 
Psychol. Monogr., 1937, 49(4, Whole No. 
220). 

WASSERMAN, H. N. The effect of motivation 
and amount of pre-rest practice upon in- 
hibitory potential in motor learning. J. 
exp. Psychol., 1951, 42, 162-172. 

Watson, J. B. Psychology from the standpoint 
of a behaviorist. Philadelphia: Lippincott, 
1919. 

WELts, F. L. Normal performance on the 
tapping test before and during practice 
with special reference to fatigue phenome- 
non. Amer. J. Psychol., 1908, 19, 437-483. 

WuitHey, S., Buxton, C. E., & Evxin, A. 
Control of rest interval activities in serial 
verbal learning. J. exp. Psychol., 1949, 39, 
173-176. 


(Received July 15, 1960) 





Psychological Bulletin 
1961, Vol. 58, No. 4, 274-298 


ON THE REFORMULATION OF INHIBITION 
IN HULL’S SYSTEM 


ARTHUR R. JENSEN 
University of California 


Among the least satisfactory ele- 
ments of Hull’s behavior system is 
his formulation of inhibition. As a 
result, there have been several at- 
tempts in recent years to reformu- 
late Hull’s theory with respect to the 
inhibition variables in the equation 
for effective reaction potential (sEp). 
The present paper critically examines 
these reformulations in the light of 
relevant experimental evidence. The 
conclusions to which this examina- 
tion leads are that these reformula- 
tions have not been an improvement 
over Hull and that this kind of re- 
formulation itself is a futile approach 
to the problem of improving Hullian- 
type learning theory. 

In all versions of his theory Hull 
(1943, 1951, 1952) formulated “ef- 
fective reaction potential” (sEr) as 
being essentially a function of “‘drive”’ 
(D) and “habit strength” (s/Zp), re- 
lated multiplicatively (i.e., DX sHr), 
minus ‘‘reactive inhibition” (Jr) and 
“conditioned inhibition’ (sJpr), re- 
lated additively (i.e., Je-+sZr). Thus: 


sEr=(DXsHr)—(Ir+slr) 


Most of the attempts to reformu- 
late Hull’s equation have been the 
result of logical, or at times merely 
verbal, rather than empirical con- 


siderations. For example, Hilgard’s 
(1956, p. 139) criticism is directed at 
the fact that Hull did not carry out 
the logical implications of his state- 
ment that Jr is a ‘‘negative drive 
state.’’ As such, Ip logically should 
subtract from D (i.e., D—Jpr) and, 
like D, should interact multiplica- 
tively with habit strength (i.e., 
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IrXsHr). Hilgard also suggests 
that, since s/r is a negative habit, it 
should interact multiplicatively with 
Ir. Thus, Hilgard’s proposed refor- 
mulation of the equation for net re- 
action potential results in the follow- 
ing: 
sEr= [(D—Ir) XsHe|—(e2XslIr) 


This new formulation seems to be 
more consistent with some of Hull’s 
own statements about the nature of 
these intervening variables, but Hil- 
gard avoids trouble by not attempt- 
ing to relate this formulation to 
empirical findings. 

Similarly, Iwahara (1957) carries 
Hull’s characterization of Jpg as a 
negative drive and sl/r as a negative 
habit to what may seem the logical 
conclusion in terms of the internal 
consistency of Hull’s theory—that 
the relationship between drives and 
habits is always multiplicative and 
never additive. Iwahara then goes a 
step further to regard s/r as a con- 
ditioned or secondary negative drive, 
with Jpg being the primary negative 
drive. From this it follows that the 
product of Ip XsIr should subtract 
from positive drive, D, and should 
also multiply sHr. Symbolically, 


sEr=sHrX|D—(1rXslz)| 
or, in expanded form, 
sEr=(sHrX D)—(sHrXIrXsIr) 


Osgood (1953, p. 379) states that 
Hull need not have postulated slr 
at all, since it might have been de- 
rived from other postulates in the 
system. If sIz is nothing other than 
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negative habit strength or the habit 
of not responding (reinforced by the 
dissipation of Jz), it would seem 
logical to subtract s/e directly from 
sl1r. This is the formulation Osgood 
has proposed (p. 349). 

More recently, Jones (1958) has 
incorporated the foregoing sugges- 
tions in his revision of Hull’s equa- 
tion. The Jones version, which com- 
bines the properties of the other re- 
visions (except Iwahara’s sHr Xs/z) 
and appears identical to Osgood’s 
suggestion, is as follows: 


sEr=(D—TIr)X(sHr—slr) 


That this formulation is quite rad- 
ically different from Hull’s is even 
more obvious when Jones mathe- 
matically expands the equation, thus: 


sEr=(DXsHpr)—(IrXsHpe) 
—(DXsIrx)+(rXslr) 


Jones’ formulation has been sub- 
scribed to by Eysenck and his co- 
workers in their attempt to utilize 
Hullian postulates in developing a 
theory of personality (Eysenck, 1957; 
Kendrick, 1958). 

Another revision, rather casually 
suggested by Woodworth and Schlos- 
berg (1954, p. 668), is that inhibition 
(Ie or slr or both?) should subtract 
from “incentive motivation” (Hull’s 
K, a function of the amount of rein- 
forcement). Presumably the total 
inhibitory potential Jz (the sum of 
Ir+sIr) subtracts from K, though 
this point is not clear in the Wood- 
worth and Schlosberg discussion. 
Their suggestion might be expressed 
symbolically as follows: 


sEr=(K—Ir—sIr)XDXsHr 


The most carefully formulated and 
empirically anchored modifications 
of Hull’s theory have been those of 
Spence (1956). His changes in the 
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inhibition part of the theory are of a 
fundamentally different nature than 
the other revisions. He has more or 
less wiped the slate clean and started 
anew by redefining inhibition and the 
independent variables of which it is 
a function. Spence’s extinctive in- 
hibition (J,) is not a function of the 
amount of effort or rate of respond- 
ing, as is Hull’s Jp, but is a function 
only of the number of nonreinforced 
responses. There is also an oscillatory 
inhibition (J,), which is the same as 
Hull’s concept of oscillation (sOpr). 
The inhibition due to delay of re- 
ward (J;,) is essentially the same as 
I,. The basis of this inhibition is as- 
sumed to be the competing responses 
that are established during the de- 
lay period or during extinction. The 
molar concepts of J; or J, simply 
represent the quantitative effects of 
these competing responses. Spence’s 
inhibition does not interact with 
other intervening variables but only 
subtracts from the reaction potential. 
In this last respect his formulation is 
essentially no different from Hull's. 
It might be asked why D, if it is re- 
garded as an energizer of all re- 
sponses in the organism’s repertoire, 
should not interact with inhibition as 
Spence conceives of it, that is, as 
consisting of interfering or compet- 
ing responses. In this respect 
Spence’s theory of extinction is not 
unlike Guthrie’s. 

With the exception of Spence, 
these attempts to reformulate Hull 
raise a number of crucial questions 
in common, some of which must be 
critically examined on the level of 
theory and methodology and others 
in terms of empirical evidence. First 
there are questions of a general 
theoretical nature which must be con- 
sidered in relation to any attempt to 
criticize or reformulate Hull’s theory. 

1. Is the verbal formulation of 
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Hull’s theory to be taken more 
seriously than the symbolic and 
quasi-quantitative formulations, or 
than the actual empirical relation- 
ships which formed the basis for 
Hull’s postulates and which he has 
held up as examples of the relation- 
ships he wished his system to predict? 

2. Does the algebraic manipulation 
of Hull’s intervening variables make 
sense theoretically and  psycho- 
logically? Are the functions repre- 
senting their interrelationships ‘‘iso- 
morphic”’ with the rules of simple 
algebra? 

3. Can experiments be designed to 
determine the exact nature of the in- 
tervening variables? 

Once one has decided to argue 
within the Hullian framework a num- 
ber of questions arise from the at- 
tempts at reformulation, the an- 
swers to which must depend upon 
empirical findings. 

1. Does sIr subtract from sHr? 
Are sHpr and gle both basically the 
same phenomenon, one merely being 
positive and the other negative in 
effect, or do they represent basically 
different processes? 

2. Is there any empirical evidence 
to support the following formula- 
tions? 

a. The interaction of DXsIpr 
(Jones, Osgood) 

b. D—TIp (Hilgard, Jones, Os- 
good) 

c. The interaction of sHrXIr 
(Hilgard, Iwahara, Jones, Osgood) 


d. The interaction of sHrXsIp. 


(Iwahara) 

e. The interaction of JrXsIr, 
which paradoxically represents an 
addition to reaction potential, the 
multiplication of two negative quan- 
tities making a positive (Hilgard, 
Iwahara, Jones, Osgood) 

f. K—Ip (Woodworth & Schlos- 
berg) 
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THE LIMITATIONS OF HULL’s THEORY 

In offering his revision, Jones 
(1958) points out that the inhibition 
aspect of Hull’s formula for reaction 
potential has been criticized by Koch 
(1954). Koch’s criticisms, however, 
apply equally to Jones’ revision as 
well as to-all the others, with the 
possible exception of Spence. Koch 
points out that the intervening vari- 
ables concerning inhibition in Hull’s 
system, particularly sZr, are not 
rigorously defined, are not clearly 
tied to experimental variables, and 
hence are indeterminate. Because of 
this, it is impossible to make rigorous 
experimental tests of Hull’s formula- 
tions or of the alternative revisions. 
Cotton (1955) has shown that a 
literal interpretation of Hull’s postu- 
lates leads to predictions that differ 
from the experimental data upon 
which Hull based the formulation of 
his postulates in the first place. In 
short, much of Hull’s theory does not 
even predict the very facts it was ex- 
pressly devised to predict. This is 
especially true with regard to the in- 
hibition postulates. None of the re- 
visions of Hull has improved this 
situation. They have merely rear- 
ranged in various ways the same in- 
determinate variables of Hull’s for- 
mula for sEr. 

Hull’s revisers have followed him 
in treating his intervening variables, 
D, sHr, Ir, sIr, etc., as if they were 
real, independent quantities whose 
laws of interaction are isomorphic 
with the rules of arithmetic and 
algebra. As we shall see, the manipu- 
lation of these hypothetical variables 
in such fashion can at times lead to 
absurdity. Hull’s intervening vari- 
ables are only intervening variables in 
the sense which MacCorquodale and 
Meehl (1948) have assigned to that 
term, and are defined only in terms 
of the independent and dependent 
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variables to which they are tied. The 
danger arises when Hull’s revisers 
mathematically manipulate the in- 
tervening variables without regard 
for the defining experimental vari- 
ables which are actually all that give 
any meaning to the intervening vari- 
ables. Of course, one of the pur- 
ported virtues of intervening vari- 
ables is that they can be mathe- 
matically manipulated as independ- 
ent entities. But once the interven- 
ing variable has been properly de- 
fined, the question arises as to the 
nature of the mathematical opera- 
tions that can suitably be applied to 
it. It is highly doubtful if the exclu- 
sive use of linear algebra by Hull and 
his revisers is at all suitable. It 
should be noted that in Hull’s own 
statements (1943) the relationship 
between experimental variables and 
intervening variables is usually any- 
thing but linear. If the exact form of 
the functional relationship is not 
known, performing linear algebraic 
operations on the intervening vari- 
ables is practically meaningless. Un- 
der these conditions, for example, one 
cannot prove on the basis of experi- 
mental data whether changes in re- 
sponse strength are the result of an 
additive or a multiplicative rela- 
tionship between intervening vari- 
ables. From more fundamental con- 
siderations, Hilgard (1958) points 
out that Hull’s intervening variables 
cannot in their present form be mul- 
tiplied meaningfully, since they are 
not in comparable units of measure- 
ment. Certainly the least objection- 
able formula for reaction potential 
is also the least specific. Consequent- 
ly it has the least predictive power: 


sEr={(D, K, sHr, Ir, etc.) 


In view of the facts here noted, great 
difficulties arise when Hull and his 
revisers become more explicit about 
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the nature of the relationships be- 
tween these variables. 

Though it would not be in keeping 
with the spirit of Hull’s formal 
theorizing, some of the problems 
might be avoided if Hull's formula 
for sEr were regarded, not as a true 
mathematical equation, but merely 
as a kind of shorthand for expressing 
certain relationships suggested by 
empirical findings. The arithmetic 
signs of addition, subtraction, and 
multiplication in the formula would 
then not be taken too literally. Thus, 
E=H-TI would not be taken to 
mean that inhibition subtracts from 
habit and that when E£ finally equals 
zero, the habit has been removed and 
the organism restored to the same 
state as before the habit had been 
acquired. The equation merely 
states in shorthand form that reac- 
tion potential, as inferred from some 
measure of response strength, de- 
creases as the experimental proce- 
dures said to increase habit strength 
are removed and the conditions said 
to produce inhibition are applied. 
The subtraction sign is used here, not 
in a strict mathematical sense, but 
only as a shorthand expression for an 
experimental manipulation. Whether 
Hull has chosen to add or to multiply 
various intervening variables most 
likely has been a result of his attempt 
primarily to represent known em- 
pirical relationships rather than to 
maintain logical consistency within 
his theory. He most likely formu- 
lated DXsHr, for example, because 
he believed this interaction of habit 
and drive represented the experi- 
mental evidence. And most probably 
the reason he did not formulate 
DXsIr, even though his theory 
seems to call for this logically, was 
simply because he found no evidence 
that suggests an interaction between 
drive and inhibition. 
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From the foregoing considerations, 
probably the ultimate conclusion to 
which we are forced regarding the 
attempted revisions of Hull’s theory 
is not so much that these revisions 
are no improvement over Hull, but 
that it is futile to attempt to improve 
upon Hull by mere juggling of his in- 
tervening variables. Hullian theory 
will not be improved by continuing 
to work with the concepts of drive, 
habit, inhibition, etc. in exactly the 
same form they were given by Hull. 
The very building blocks of the 
theory, so to speak, are inadequate, 
and no amount of recombining them 
in new ways is likely to result in any 
substantial advance in _ learning 
theory. 


REFORMULATIONS AND EMPIRICAL 
EVIDENCE 
sHr—slr 


While Hull (1943) refers to sIp as 
a “negative habit,”’ there is no in- 
dication in his writing that he re- 


gards sr as merely negative sHpr. 
The revisions suggested by Osgood 
and by Jones are based on the as- 


sumption that sHr and slr are 
basically the same phenomenon, s/z 
merely being the negative counter- 
part of sHr. Thus, if they are the 
same process but merely opposite in 
effect, it seems logical that one should 
subtract from the other. Similarly, 
if sHp interacts with drive, so should 
Hull, however, quite clearly 
did not regard sHr and slr as 
basically one and the same phenom- 
enon, and his reasons are based on 
experimental evidence that reveals 
differences between the two. Pavlov 
(1927) originally pointed out the 
greater susceptibility of internal in- 
hibition (of which spr is one variety) 
to external inhibition (i.e., disinhibi- 
tion) than is the case with the ex- 
citatory process corresponding to 
Hull’s sHr. That slp is more labile 


sIr. 
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and sensitive to external influences 
than is sHpr suggests that it is not 
merely the negative counterpart of 
the same phenomenon. Therefore, 
Hull is consistent with Pavlov in not 
subtracting s/pr directly from sHrp. 
Another line of evidence that ex- 
citation (conditioning) and inhibi- 
tion (extinction) are basically dif- 
ferent processes is well demonstrated 
in a series of experiments by Reyn- 
olds (1945a, 1945b), which showed 
that acquisition of a conditioned re- 
sponse is slower for massed than for 
distributed trials, while the reverse 
relationship holds for extinction. 
Also a number of studies (Hilgard & 
Marquis, 1940, p. 119) have showna 
negative correlation between the speed 
of conditioning and of extinction. 
The issue of whether the gen- 
eralization gradients of excitation 
(conditioning) and inhibition (ex- 
tinction) are the same or different 
was left undecided by Hull (1943, p. 
265). The Bass and Hull (1934) and 
Hovland (1937) studies referred to by 
Hull were not adequate to answer 
this question. Not finding evidence 
to the contrary, Hull merely as- 
sumed that the generalization grad- 
ients of excitation and inhibition were 
the same, which is a convenient as- 
sumption in his theory of simple dis- 
crimination learning (1943, p. 267) 
based on the interaction of the 
gradients of excitation and inhibi- 
tion. On this point, however, there 
is now some tentative evidence that 
seems to contradict Hull’s assump- 
tion. Liberman (1951) found that 
extinction (sZp)' has broader transfer 


1 In Hull’s system, though the entire proc- 
ess of extinction is not explained in terms of 
only sIr, but includes reactive inhibition 
(Ir) as well, once extinction is complete, or 
after enough time (probably 5 to 10 minutes) 
has elapsed for the dissipation of Zr, extinc- 
tion is conceived of as solely a function of the 
relative magnitudes of the positive reaction 
potential and sJp. 
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effects than acquisition (sHpr). Also 
there is some evidence (Razran, 1938) 
that the stimulus generalization of ex- 
tinction (sJr) differs from that of ex- 
citation (sHr), in that extinction 
shows greater stimulus generaliza- 
tion; the gradient of its generaliza- 
tion contains fewer steps; the stimu- 
lus generalization of extinction, un- 
like that of acquisition, does not ex- 
tend to heterogeneous CRs; and 
generalization of extinction is more 
affected by drugs than is generaliza- 
tion of conditioning. 

The formulation sH7r—sIr seems 
misleading in view of the fact that 
successive periods of acquisition and 
extinction become more rapid and 
that an organism in which an ac- 
quired response has been extin- 
guished is not the same as an or- 
ganism that had never acquired the 
response. Razran (1956) has pointed 
out that in a partially extinguished 
CR there can be shown the co- 


existence of two opposing processes, 


positive and negative. “Even a 
wholly extinguished CR bears, by all 
signs, within itself a two-way CR con- 
nection”’ (p. 42). Successive acquisi- 
tion and extinction may be conceived 
of as a kind of discrimination learn- 
ing, in which both sHe and s/pr grow 
simultaneously, neither one diminish- 
ing the other. The cessation of rein- 
forcement becomes a cue, a condi 
tioned inhibitor, the strength of 
which increases throughout succes- 
sive extinction periods (Bullock & 
Smith, 1953; Perkins & Cacioppo, 
1950). This kind of discrimination 
learning is likely to be a very primi- 
tive kind of discrimination not in- 
volving symbolic or mediating proc- 
esses. Tentative evidence for this 
opinion is found in the experiments 
on spinal conditioning, which, how- 
ever, are not yet entirely beyond dis- 
pute as examples of true condition- 
ing. Nevertheless, for what it is 
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worth, Shurrager and Shurrager 
(1946) have reported that both con- 
ditioning and extinction, measured at 
a single synapse in a spinal prepara- 
tion, become faster with successive 
periods of conditioning and extinc- 
tion. 

Hull (1952, p. 114) also pointed 
out that the delay CR (the “inhibi- 
tion of delay’”’ being due to sJpr) is 
eliminated by certain drugs, for ex- 
ample, caffeine and benzedrine. It is 
hard to see why the CR itself would 
not be markedly weakened or elimi- 
nated altogether if these drugs af- 
fected both sHer and s/pr in the same 
manner. The CR is strengthened, 
however, while the period of delay is 
markedly shortened. Certain drugs 
thus seem to have opposite effects on 
sHpr and slp, suggesting again that 
they represent essentially different 
underlying physiological processes. 
Skinner’s (1938, pp. 412-413) finding 
that benzedrine and caffeine increase 
the number of responses to a criterion 
of extinction lends plausibility to the 
idea that these drugs have different 
effects on sHpr and slr. If sHr and 
slr were the same process, then a 
drug increasing sHr would also in- 
crease the inhibitory effect of each 
nonreinforced response. If this were 
the case, the unfailing effect of 
stimulant drugs in increasing the 
number of responses to extinction 
could not easily be accounted for. 
The evidence bearing on this subject, 
however, is not crucial, in that we do 
not have evidence regarding the per- 
centage increase in responding during 
extinction under benzedrine over the 
operant level (preconditioning re- 
sponse rate) under benzedrine. Also 
it should be noted that the theoretical 
problem hinges to some extent upon 
the hypothesized relationship be- 
tween excitation (or sHp) and inhibi- 
tion (slr); that is, whether it is 
the absolute difference between the 
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two that matters or the ratio (or 
“balance”’) between excitation and 
inhibition. In the Pavlovian system 
it is the balance or ratio of excitation 
to inhibition that determines reac- 
tion potential. In Hull’s system it is 
the absolute difference between sHpr 
(and the variables interacting with 
it) and Jr. A strictly Pavlovian re- 
vision of Hull might take the follow- 
ing form: 


da DXsHr 
sEr=log — ss 
Ir 


Thus it is the balance between exci- 
tatory and inhibitory processes that 
is emphasized and not the absolute 
difference. In this equation, when 
the total inhibitory potential (Jz) is 
equal in strength to DXsHpr, the 
ratio of DX sHr/Ir becomes 1.0, and 
since log 1.0=0, the effective reac- 
tion potential (sEr) will equal zero. 

The fact that Eysenck and his co- 
workers have subscribed to the Jones 
revision would seem incompatible 
with Eysenck’s (1956) theory con- 
cerning the @xtinction of slr. The 
extinction of sIr is paradoxical and 
inconsistent with other aspects of 
Hull’s theory and also of Jones’ re- 
vision. If, as maintained by Jones 
and by Eysenck, s/r is merely nega- 
tive sfZr, then the mere lack of rein- 
forcement of sZr (reinforcement be- 
ing the dissipation or avoidance of 
Ir) should not result in a decrease in 
sIpr. Lack of reinforcement does not 
diminish the sHp already present, so 
it should not diminish spr either. 
The notion that extinction is an ac- 
tive process of an increasing inhibi- 
tion (/p) depressing performance 
(sEr) is basic in Hull’s system. It, 
therefore, seems absurd, while re- 
maining in the Hullian framework, 
to speak of the extinction of inhibi- 
tion without first postulating a sec- 
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ond inhibitory process which de- 
presses the first. Fortunately, there 
is no experimental evidence at pres- 
ent to suggest that such a complica- 
tion would be necessary. 


DXslIr 


In Hull’s theory there is no inter- 
action between drive and _ condi- 
tioned inhibition. The DX sJpz inter- 
action, however, is explicit in a 
number of the revisions. Since sp is 
the primary and essential intervening 
variable accounting for experimental 
extinction, we may well examine the 
different predictions generated by 
Hull and the revisions with respect 
to the DX sz interaction. 

According to Hull, since D multi- 
plies only sHp and not sJr, we should 
predict that certain measures of ex- 
tinction will be affected by changes 
in D. With the Hullian formula 
DXsHr—sIr, one can predict that 
under a high drive level there will 
be a greater number of responses to 
extinction (m) than under low drive. 
The same increment of slp is gen- 
erated by each response during ex- 
tinction, regardless of the level of D, 
while the positive reaction potential 
(DXsHr) is increased by a higher 
level of D. Not only does it follow 
from Hull's formula that a greater 
number of responses is required for 
extinction, but extinction curves 
under high and low D should be 
parallel. They approach the cri- 
terion of extinction with the same 
slope, but reach it at different points. 

The revisions containing the 
DXsIr interaction generate pre- 
dictions that are exactly opposite to 
the foregoing. If net reaction po- 
tential is a resultant of DXsHpr 
—DXsIr, then every increment of 
sIp will be increased by D to the same 
degree that sHr has been increased. 
Consequently, there should be the 
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DX He~slp 




















Number of Extinction Trials 


Fic. 1. The relationships between drive (D), number of trials to extinction (m), and effec- 
tive reaction potential (sEr) as predicted by Hull’s formulation (left) and by Jones’s formula- 


tion (right), 


same number of responses to extinc- 
tion under high drive as under low 
drive. Also, the slopes of the extinc- 
tion curves, as measured by, say, 
rate of responding, would be different 
under high and low drive. In other 
words, the curves would approach 
the criterion of extinction with dif- 
ferent slopes, but would reach it at 
the same point. 

If the proponents of the DXsIpr 
formulation object to the foregoing 
predictions on the grounds that Jp 
has not been taken into account, let 
it be pointed out that s/p is essential 
for complete extinction of the re- 
sponse and that extinction can take 
place with sufficiently spaced trials 
to prevent the growth of Jr. If, as 
Hull hypothesized (1943, pp. 300- 
301), the formation of sZr is depend- 
ent upon nonresponding being co- 
incident with the dissipation of Jr, 
extinction could not take place if all 
Ir had dissipated in the interval be- 
tween each presentation of the non- 
reinforced CS. Yet extinction is 
known to occur even with long inter- 
trial intervals of 24 hours or more, 
when Jpg should supposedly have 
been completely dissipated (Razran, 
1956, p. 43). This, along with the 


fact that in all of the revisions an 
increment of Jr will reduce sEr by 
the same proportion regardless of the 
level of D, makes J, irrelevant to the 
present argument. (The D—Tzp for- 
mulation is discussed at a later 
point.) 

There is a considerable amount of 
experimental evidence bearing on the 
above predictions. The preponder- 
ance of evidence favors the Hullian 
formula and fails to support the no- 
tion of a DXsIp interaction. Perin 
(1942), working with rats, found a 
marked positive relationship between 
D (degree of hunger) at the time of 
extinction and the number of re- 
sponses required for extinction. 
Brandauer (1953) extinguished bar 
pressing in rats under three levels of 
drive (thirst) and found a positive 
relationship between strength of drive 
and number of responses during ex- 
tinction. Even under minimal dif- 
ferences in hunger drive (.5, 1, 2 
hours’ deprivation) Saltzman and 
Koch (1948) found highly significant 
differences in number of responses to 
extinction in a modified Skinner box. 
Brown (1956) also found that rats on 
high drive make more responses dur- 
ing extinction than those on low 
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drive. Cautela (1956) showed es- 
sentially the same relationship for the 
extinction of a discrimination re- 
sponse. However, he found a slight 
decrease in m for levels of D beyond 
23 hours’ deprivation. He attributed 
this phenomenon to the generaliza- 
tion gradient of the drive stimuli; 
under the highest levels of D, the 
drive stimuli were further out on the 
generalization gradient from the 
drive conditions under which the 
original learning had occurred. The 
energizing and stimulus properties of 
drive are thus apt to interact in this 
type of experiment. 

In experiments with human sub- 
jects, where anxiety has been used as 
a measure of drive, a similar rela- 
tionship with extinction has been 
found. In one study, high anxiety 
subjects required almost twice the 
number of trials to extinguish the 
conditioned eyeblink as did low anx- 
iety subjects (Spence & Farber, 
1953). Bitterman and Holtzman 
(1952) obtained similar results in ex- 
tinguishing the PGR in high and low 
anxiety subjects. 

Skinner’s (1938) early notion of 
the “reflex reserve’? appears to be 
consistent with the DX s/r formula- 
tion. Skinner believed that the num- 
ber of responses emitted during ex- 
tinction was solely a function of the 
number of previously reinforced re- 
sponses and the schedule of rein- 
forcement. Thus drive should not 
affect m, but would affect only the 
rate of emission of responses. The re- 
flex reserve concept, however, has 
long since been found unfruitful. 
While theoretically it is probably not 
a strictly testable hypothesis, it now 
at least appears quite incorrect in 
view of the evidence (Ellson, 1939). 
Skinner’s (1938) original belief that 
rate of responding, but not the num- 
ber of responses in extinction (m), 
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is affected by drive is contradicted 
by Bullock’s (1950) investigation 
showing a correlation of .61 between 
rate and mn. This positive correlation 
between response rate and number 
of responses to extinction would cer- 
tainly seem inconsistent with a 
DXsIr formulation. If drive in- 
creases response rate, sJrz should in- 
crease faster under higher drive, each 
response adding the increment 
DXsIr, thus leading to more rapid 
extinction. The evidence is exactly 
the contrary. Higher drive not only 
increases the rate of response, but 
also increases the total number of re- 
sponses to a criterion of extinction. 
The best available evidence also 
indicates that the slope of the extinc- 
tion curve is the same under high 
and low drive, as would be predicted 
from Hull’s theory. Sackett (1939) 
showed that when the extinction 
curves of two groups of rats, one 
group extinguished under 6 hours’ 


“hunger drive and the other under 


30 hours’ drive, are Vincentized, the 
forms of the two curves are almost 
identical. The 30-hour group pro- 
duced more responses to extinction 
and required more time to extin- 
guish, but the slope of the extinction 
curve was the same as that of the 6- 
hour group. Barry (1958) trained 
rats in a running response and ex- 
tinguished them under high and low 
drive. The extinction curves were 
parallel, and when drive was equal- 
ized in both groups late in extinction, 
the curves converged and were iden- 
tical after three trials. When drive 
was equal for both groups early in ex- 
tinction, and then, later in extinc- 
tion, the groups were run under high 
and low drive, the extinction curves 
diverged, and, after three trials, con- 
tinued almost parallel, as would be 
predicted from Hull. (The fact that 
it took three trials, rather than one, 
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for the curves to converge or diverge 
after the change in D, however, is 
somewhat embarrassing to Hull’s 
theory as it is also to the revision.) 
Both these findings are consistent 
with the DXsHr—slr formulation 
and not with DX sHr—DXsIr. But 
these experiments cannot be re- 
garded as at all definitive in view of 
the finding of Reynolds, Marx, and 
Henderson (1952) of an interaction 
between D and the incentive factor 
K (a function of amount of reward). 
This interaction plays havoc with 
any theoretical conclusions drawn 
from experiments on the effects of 
drive on extinction in which the in- 
centive factor has not been taken into 
account. Reynolds et al. (1952) had 
four groups of rats learn bar pressing 
under all combinations of high drive 
-low drive and large reward-small 
reward. All animals were given ex- 
tinction trials under equal drive. It 
was found that 


in those learning situations where a relatively 
large amount of reward is employed for rein- 
forcement, high D animals extinguish more 
readily than low D animals; and . . . where a 
relatively small reward is given per reinforce- 
ment, low D animals extinguish more readily 
than high D animals (pp. 41-42). 


Hull’s theory and its revisions gen- 
erate conflicting predictions regard- 


ing spontaneous recovery. In the 
Jones (1958) formula, sEr=D—Tp) 
X(sHr—slIr), spontaneous recovery 
could occur only if at the end of the 
first set of extinction trials D—Ip=0. 
But this formulation would lead to 
problems, since, if D—IJr=0, no 
habits at all could be activated tem- 
porarily until some of the Jr had 
dissipated, and no behavior of any 
kind would occur after the end of the 
first extinction period. We know 
very well, however, that animals go 
on behaving in various ways im- 
mediately following the extinction of 
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a particular response. But then if 
we do not wish to assume that D—Tp 
is equal to zero immediately after the 
first extinction period, we must as- 
sume that sHr—sJr equals zero, or 
extinction would not have occurred. 
Yet if sHr-—sIr were zero, there 
could be no spontaneous recovery. 
Conceivably one way out of this 
dilemma for the Jones revision is to 
make some assumptions about a re- 
action threshold which must be ex- 
ceeded before ar overt response is 
made. Thus, overt extinction could 
occur before either D—Jr=0 or 
sHr—sIr=0. Spontaneous recovery 
would then result from the dissipa- 
tion of Jr, asin Hull's theory. If this 
were true, one might predict from 
the Jones revision that there would 
be very little, if any, spontaneous re- 
covery after extinction under high 
drive, but greater amounts of spon- 
taneous recovery after extinction 
under low drive. Since D—Ip would 
approach the threshold value quickly 
where D is initially low, there would 
result an appreciable increase in D, 
and hence of response strength, with 
the dissipation of Jr, and sponta- 
neous recovery would result. Under 
high drive D—Ip would not ap- 
proach the threshold value as quickly 
as would sHr-—sIr. Thus, since 
sHr—sIpr would be a smaller value 
after the first extinction, there should 
be less spontaneous recovery at the 
beginning of subsequent extinction 
periods. 

Different predictions may be made 
from Hull and the DXsIpr revision 
concerning the effect of an increase in 
drive after extinction is complete. 
According to Hull’s (DX sHr)—slr, 
an increase in drive after complete 
extinction should result in further 
“spontaneous recovery.”’ According 
to the DX(sHr—sIr) formulation, 
once extinction is complete (i.e., 





284 


sHr—sIr=0), an increase in D 
should not produce any ‘‘spontaneous 
recovery.” 

Unfortunately, the experimental 
evidence bearing on all these predic- 
tions is meagre, conflicting, and in- 
conclusive. Hull (1943, p. 249) cites 
Pavlov’s finding that an increase in 
drive after extinction is complete 
causes the reappearance of the CR 
in the presence of the CS. This is, of 
course, consistent with Hull’s for- 
mulation, but not with the DXsJpr 
formulation. The same phenomenon 
seems to occur also in instrumental 
conditioning. Jenkins and Daugherty 
(1951) extinguished a pecking re- 
sponse in pigeons under three levels 
of drive. They found that the num- 
ber of responses in extinction is a 
function of drive level and that when 
extinction was relatively complete an 
increase in drive caused gross re- 
covery of the conditioned behavior. 
The authors used the term “rela- 


tively complete’’ extinction because 


the pecking response in pigeons 
never seems to be completely ex- 
tinguished. But the recovery of a 
“relatively extinguished’’ CR under 
increased drive is certainly more con- 
sistent with (DXsHr)—slIr than 
with DX(sHr-—slIr). The writer 
knows of only one study that ap- 
pears to contradict the finding of 
Jenkins and Daugherty. Crocetti 
(1952) found that when rats were 
“completely”’ extinguished in a Skin- 
ner box, increase in drive did not in- 
crease the response rate over the pre- 
conditioning response rate under the 
higher level of drive. (Extinction 
was considered complete when the 
response rate became equal to the 
operant level prior to conditioning.) 
This finding is, of course, inconsistent 
with Hull’s (DX sHr) — slr. Crocetti 
did not control for the changes in 
the drive stimulus (Sp) with in- 
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creased hunger, and so his finding is 
not definitive with respect to the 
present theoretical issue. If we as- 
sume that sHpr and slr are condi- 
tioned to Sp as well as to other 
stimuli, then the changes in Sp from 
the conditioning trials to the extinc- 
tion trials or spontaneous recovery 
trials becomes a crucial point in this 
type of experiment. Fortunately in 
an experiment by Lewis and Cotton 
(1957) the etfect of such changes in 
Sp was taken into account. Three 
groups of rats were trained in a run- 
ning response under three levels of 
drive, viz., 1, 6, and 22 hours’ food 
deprivation. Each group was then 
divided into three groups which un- 
derwent extinction under 1, 6, and 22 
hours’ drive. Extinction proceeded 
more rapidly under lower drive, as 
would be expected from Hull’s for- 
mulation, but drive level seemed to 
have no effect on the magnitude of 
spontaneous recovery, a fact which 
is inconsistent with (DX sHr)—slr. 
But the DX (sHr—sIpr) revision can- 
not comprehend both of these find- 
ings either, for with this formulation 
drive level should have no effect on 
number of trials to a criterion of ex- 
tinction. It seems obvious that 
clarification of the effects of drive on 
spontaneous recovery must await 
further experimentation which is 
specifically designed for this purpose 
and which takes into account both 
the energizing and stimulus proper- 
ties of drive. Some of the lack of 
consistency and agreement in this 
area may also be due to interspecies 
differences and to the use of different 
measures of response strength. La- 
tency, running time, response rate, 
and number of trials to extinction are 
used singly in different studies as 
measures of response strength even 
though they are far from being per- 
fectly correlated. Each measure un- 
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doubtedly involves certain parame- 
ters peculiar to itself. To use only 
one such measure of response strength 
and only one species of animal is an 
inadequate method for testing a pre- 
cise deduction from a general be- 
havior theory. 

In the delayed CR, the inhibition 
of the response during the period of 
delay is attributed in the Hullian sys- 
tem to sJr (Hull, 1952, p. 114). Con- 
sistent with Hull’s formulation of 
DXsHr-—slIr is the fact that an in- 
crease in D lessens or eliminates the 
period of delay in the CR. The 
DX slr formulation does not accom- 
modate this fact, but leads to an op- 
posite prediction, i.e., an increase in 
D should strengthen the inhibition of 
delay. 

One of the weakest points in Hull’s 
system involves the dependence of 
sIr upon Ip. It is no less troublesome 
to any of the revisions. (Spence ex- 


cepted, since his inhibition concept 


has nothing in common with Jz.) It 
is stated that Jr is generated when- 
ever a response is made, the amount 
of Ir being a function of the effortful- 
ness of the response, and that Jp 
rapidly dissipates, accumulating only 
if responses follow one another in 
rapid succession. The dissipation of 
Ir, a ‘“‘negative drive state,’’ rein- 
forces the habit of not responding, 
which is sl/r. This hypothesis en- 
counters obvious difficulties. If a re- 
sponse is followed by the dissipation 
of Ir, this would seem to have all the 
requirements for reinforcing the re- 
sponse, leading to increased response 
strength rather than extinction.? 


2 One can get around this problem, of course, 
by invoking the gradient of reinforcement. If 
the time required for the dissipation of Jz is 
greater than the effective gradient of rein- 
forcement, the foregoing proposition would 
not hold true. At present there is no basis for 
arguing the point. While Hull gives 20-30 
seconds as the maximum delay between the 
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Also, subzero extinction would be un- 
likely if increases in sJr were de- 
pendent upon reactive inhibition (Jp). 
And it is almost impossible to explain 
the extinction of relatively effortless 
CRs, such as salivation, eyeblink, and 
the alpha rhythm, when the extinc- 
tion trials are widely spaced. Pavlov 
(1927, p. 76) obtained rapid extinc- 
tion of the salivary CR using only one 
presentation of the CS per day. 
Razran (1956, p. 43) has reviewed 
the evidence that contradicts a 
theory of extinction based on reac- 
tive inhibition. There are even cases 
where spaced trials have led to more 
rapid exintction than massed trials 
(Sheffield, 1950; Stanley, 1952). 
Kimble (1950) has argued from 
studies of motor learning that a cer- 
tain threshold or critical level of Ip 
must be reached before s/p develops. 
Motor experiments have 
presumably shown that Jr can form 
without leaving behind any s/r. This 
is inconsistent with extinction based 
on widely spaced trials. In fact, it 
does not seem to the writer that the 
Hullian inhibition postulates, as they 
have been used in the field of motor 
learning, represent the same processes 
found in extinction phenomena. It 
has been a case of giving the same 
theoretical labels to basically dif- 
The most funda- 
mental difference between s/p in con- 
ditioning and in motor learning has 
to do with the amount of response 
necessary to produce sr. Five or six 
minutes of pursuit rotor practice 
seems necessary before slp is in evi- 


learning 


ferent processes. 


response and reinforcement if reinforcement is 
to be effective, the time required for the dis- 
sipation of Jr is solely a function of the 
amount of Jz generated by the response and, 
therefore, is variable, although the rate of dis- 
sipation of Jz may not be variable. Perhaps an 
even simpler way out is the idea that Jr leads 
to a “resting response’’ which in turn is rein- 
forced by the dissipation of Jz. 
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Deviations from Stimulus Reinforced 


Fic. 2. Illustrates algebraic summation the- 
ory of discrimination. (Effective reaction po- 
tential, sEp, is a result of subtraction of gen- 
eralized extinction, sIr, from generalized 
conditioning, sHr. See text for full explana- 
tion.) 


dence, while only a single condi- 
tioned response, such as salivation, 
PGR, or eyeblink, is evidently suf- 
ficient to produce sJr. Thus it does 
not seem that the sJr invoked in 
theories of motor learning could be 
the same s/p as that in Hull’s theory 


of conditioning. ‘ 

It is also held by Hull, and even 
more explicitly by his revisers, that 
the amount of s/e built up per trial is 
related to the amount of Jp dis- 
sipated, the dissipation acting as re- 
inforcement for the negative habit, 
sIr. But this is inconsistent with 
Hull’s own revision of his theory 
(Hull, 1951), in which the growth of 
habit, sHr, and presumably also of 
negative habit, sJr, is a function only 
of the number of reinforcements and 
not the amount of reinforcement. 
None of these awkward predicaments 
has been remedied by the revisions 
here reviewed. Those revisions in- 
sisting on the theoretical equivalence 
of sHr and slp as being merely posi- 
tive and negative habits have re- 
tained one of the weakest elements 
in the Hullian system. 


Discrimination learning. If dis- 
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crimination learning involves an in- 
crease in habit strength to the posi- 
tive stimulus (S?) and an increase 
in inhibition (Jp and sIr) to the 
negative stimulus (S4), then the ef- 
fects of drive on discrimination 
learning should be highly germane to 
the plausibility of the DXsIpr for- 
mulation. Jones (1958) invokes 
Spence’s (1937) theory of discrimina- 
tion learning, adapted by Hull, in- 
volving the overlapping generaliza- 
tion gradients of sHr and slp, in 
support of the DX sIp part of his re- 
vision. This theory is illustrated in 
Figure 2. The discrimination would 
be perfect (except for behavioral 
oscillation) when the net reaction po- 
tential resulting from subtracting the 
generalized s/r from the generalized 
sHp is some positive value for S? and 
zero for S4, as in Figure 2. Jones 
(1958) argues that, according to 
Hull’s DX sHr—slIpr, an increase in 
D should obliterate the learned dis- 
crimination. Since some discrimina- 
tions are not obliterated or even 
weakened by an increase in D, Jones 
reasons that sr must also be multi- 
plied by D, so that the increase in 
sIr will be proportional to the in- 
crease of sHrz when multiplied by D, 
thereby preserving the discrimina- 
tion. 

Before Jones’ argument can be 
evaluated, some clarification of the 
Spence-Hull theory of discrimination 
learning is necessary. In the first 
place, there is often confusion con- 
cerning whether discrimination learn- 
ing is a matter only of the relative 
strengths of sEr to the S? and S4, 
or whether the formation of a dis- 
crimination requires the reduction of 
sEpr to the S4 to zero or at least be- 
low the operant level of the response, 
i.e., below the strength of the re- 
sponse before any conditioning or ex- 
tinction has occurred. If the former, 
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then all that would be necessary for 
discrimination to take place would 
be that the S? have greater sEz than 
the S4. The sEz to the S4 would not 
necessarily have to undergo some de- 
gree of extinction. If this were the 
case, Jones’ use of the Spence-Hull 
theory of discrimination, as _ illus- 
strated in Figure 2, would not be ap- 
plicable to the present argument con- 
cerning the effects of drive on dis- 
crimination learning. The evidence, 
however, strongly suggests that the 
sEp to the S4 must undergo some de- 
gree of extinction for discrimination 
to become nearly perfect. To this 
extent, at least, the Spence-Hull 
theory appears to be correct. 

For example, Grice (1948) gave 
one group of rats 200 rewarded trials 
in responding to a disc 8 centimeters 
in diameter and gave another group 
of rats 200 rewarded trials in re- 
sponding to a 5-centimeter disc. 


Then both groups were given dis- 


crimination training, with the 8- 
centimeter disc as the S? and the 5- 
centimeter disc as the S*. The group 
which had been previously rewarded 
on the 8-centimeter disc learned the 
discrimination faster. Now if all 
that were involved in discrimination 
were the relative response strengths 
to the S? and S4, the 8-centimeter 
group should have learned to make 
the discrimination immediately, since 
response to the S” had already been 
rewarded on 200 trials, and the re- 
sponse strength to the S¢ resulting 
from stimulus generalization would 
have been less than the response 
strength to the S?. Since the learn- 
ing curve for the acquisition of the 
discrimination is very gradual, how- 
ever, it suggests that extinction of the 
response to the S4 through non- 
reinforcement is necessary for the 
learning of the discrimination. 

An even more cogent demonstra- 
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tion of the necessity for extinction of 
S4 in discrimination learning is an ex- 
periment by Fitzwater (1952). Three 
groups of rats were used: Groups A, 
B, and C. In preliminary training 
Group A was run an equal number of 
times into each of two alleys having 
differential cues—call them X and 
Y, respectively. X was always rein- 
forced; Y was never reinforced. 
Group B was run an equal number 
of times into each of two alleys hav- 
ing the Cues X and Z. X was always 
reinforced; Z was never reinforced. 
Group C was run only into one alley, 
with Cue X, the same number of 
times as the other groups. Then dis- 
crimination training began, with the 
animals having to learn to discrimi- 
nate X as the S” and Y as the S4. 
Group A learned the discrimination 
most rapidly, while Groups B and C 
did not differ significantly in speed of 
learning. The theoretical interpreta- 
tion of these results is that Group A 
had already built up inhibition to the 
S*, while Groups B and C had not. 
Fitzwater concluded that 

apparently in a visual discrimination task it is 
about as important to establish an avoidance 
habit as an approach habit, and that an ap- 
preciable discrimination does not seem to oc- 


cur if an approach habit is established alone 
(p. 480). 


‘ 


The terms “approach habit’ and 
“avoidance habit’” may be inter- 
preted in the context of the present 
discussion as excitation (or sHr) and 
extinction (slr), respectively. Thus 
it is apparent that a decrease in sEr 
to the S4 as well as an increase in 
sEr to the S” is necessary for dis- 
crimination learning. It is not just a 
matter of sEr to the S” being rela- 
tively greater than to the S4. 
Another experiment by Grice 
(1949) offers further evidence that 
discrimination depends upon the ex- 
tinction of the response to the S4 and 
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not merely a relative difference in 
response strengths between S? and 
S4. One group of rats was trained in 
a visual size discrimination with S? 


and S4 presented simultaneously, 


and another group was trained on 
the same discrimination with S? and 
S4 presented successively in random 
order. Grice found no difference be- 
tween the “‘simultaneous”’ and ‘“‘suc- 
cessive” groups in the rate of learn- 
ing the discrimination. In both cases 
learning apparently consisted of in- 
creasing the response strength to the 
S” and completely extinguishing the 
response to S*. Furthermore it was 
found that the rats which learned the 
problem as a pair (i.e., simultaneous 
presentation) responded differently 
to the S? and S4 when they appeared 
singly, showing that even under 
simultaneous presentation of the 
S” and S4, the response to the S4 had 
undergone extinction. 

It is not maintained that complete 
extinction of the response to S4 is 
necessary. Extinction is a relative 
matter and is probably best meas- 
ured, not in relation to some theoret- 
ical ‘‘absolute zero,’’ but in relation 
to the “‘operant level’’ or probability 
of occurrence of the particular re- 
sponse before extinction trials have 
been assumed to take place. In 
the Grice (1949) experiment there 
was a decrease in latency of response 
to S”? and an increase in latency of 
response to S4, whether the two 
stimuli were presented simultane- 
ously or successively. sEg to the S4, 
as measured by latency, was con- 
siderably less at the end of discrimi- 
nation training than at the begin- 
ning. In fact, extinction of response 
to S4 may play a greater role in dis- 
crimination learning than does the 
strengthening of the response to S?. 
Webb (1950) trained rats to jump to 
a black-white discrimination until it 
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was well learned. When, after train- 
ing, only the S? was presented to the 
rats, the mean latency of their re- 
sponse was 2.0 seconds, which was 
not significantly less than the pre- 
training latency. On the other hand, 
when only the S4 was presented, the 
mean latency of response was 80.5 
seconds, which may be interpreted 
as indicating considerable extinction 
or inhibition of the response to the 
S*. If one defines the zero level of 
sEr in the Hull-Spence model in 
Figure 2 simply as the operant level 
(i.e., the pretraining latency or prob- 
ability of responding to the particu- 
lar stimuli), then this model appears 
to be quite consistent with the ex- 
perimental evidence in showing that 
discrimination depends upon extinc- 
tion of the response to the S4. 

This model, however, seems to be 
deficient in some other respects. 
Hanson (1957), for example, per- 
formed a very careful experiment 
which led to the conclusion that 
over-all response strength is not 
weakened by discrimination train- 
ing, as would be predicted from the 
Spence-Hull model. (That is, since 
the resultant sEpr is the algebraic 
sum of generalized excitation and in- 
hibition, sEr to the S? should be less 
after discrimination training than it 
would be in simple conditioning to a 
single stimulus.) Hanson concluded 
that 
the major result of discrimination training is 
to bring a large proportion of the responses 
available in extinction under the control of 
another range of stimuli, those which do not 
ordinarily gain control of the response as the 
result of simple conditioning without differ- 
ential reinforcement (p. 889). 


This conclusion is not compatible 
with the Spence-Hull theory. 

It may be argued that Jones has 
taken the Spence-Hull diagram (Fig- 
ure 2) too literally. Very little is 
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known about the actual shapes of the 
generalization gradients of sHpr and 
slr, and until a proper metric is 
worked out, arguments over this 
point cannot be settled. What little 
evidence there is, though far from 
conclusive, suggests that the gen- 
eralization gradients of excitation 
and inhibition are probably different 
in a number of respects (Razran, 
1938). Furthermore, the amount of 
overlap of the gradients of excitation 
and inhibition will depend on the dis- 
tance apart of S? and S4, and there is 
reason to believe that the effects of 
drive on discrimination will interact 
with the degree of disparity between 
S? and S4 (Broadhurst, 1957). We 
would predict from Hull’s DXsHpr 
— sir that the farther apart S? and 
S4 are, the less deleterious to the dis- 
crimination are the effects of in- 
creased drive. This essentially is the 
Yerkes-Dodson Law (Yerkes & Dod- 
son, 1908), which, in its most general 
form, states that the optimum mo- 
tivation for a learning task decreases 
with increasing difficulty. This rela- 
tionship between drive and difficulty 
of discrimination, however, cannot be 
predicted from the Jones formulation 
of DX(sHr—- slr). 

Rather than arguing from a highly 
hypothetical model involving the 
relative shapes and magnitudes of 
the generalization gradients of sHr 
and slr, as Jones has done, we can 
better make predictions concerning 
the directly observable effects of in- 
creased drive on discriminations. 
What is the effect of drive on the 
initial learning of a discrimination, 
and does an increase in drive have a 
different effect on the learning of easy 
and difficult discriminations, as de- 
termined by time required for learn- 
ing? What is the effect of change in 
drive on discriminations that are 
already established? What effect does 
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a change in drive have on the extinc- 
tion of a discrimination? 

In discrimination learning, since 
the relative amounts of sHr and 
slr built up to the S? and S4 are dif- 
ferent, we would expect from the 
DX(sHr—sIr) formula that an in- 
crease in D would always have a 
facilitative effect on learning a dis- 
crimination. The degree of facilita- 
tion would depend upon the degree of 
difference between S? and S4. If we 
assumed considerable overlapping of 
generalization gradients, then there 
would be relatively little effect of an 
increase in D. If the discrimination 
were easy, increases in D should im- 
prove the discrimination, since the 
relatively greater sHp to the Sp and 
the relatively greater sJr to the S4 
would both he multiplied by D. In 
no case should discrimination be 
weakened by an increase in D. 

On the other hand, if we assume 
that response to S* must undergo 
extinction for a discrimination to be 
learned, Hull’s formula DX sHr—sIr 
leads to quite different predictions, 
viz., that increase in D should weaken 
difficult discriminations, where one 
might assume overlap of the stimulus 
generalization gradients, but would 
strengthen discriminations in which 
S? and S4 are widely separated on 
the generalization gradient. 

What is the evidence? We have al- 
ready mentioned the Yerkes-Dodson 
Law, which is possibly consistent 
with Hull, but certainly not with the 
DX(sHr—-—sIr) formula. Broad- 
hurst (1957) has demonstrated this 
“‘law’’ most effectively, using rats in 
a brightness discrimination problem 
and manipulating drive by means of 
oxygen deprivation. Skinner (1938, 
p. 188) has observed that it is im- 
portant in establishing discriminant 
operant conditioning to keep the 


hunger drive as constant as possible, 
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for changes in drive disturb the dis- 
crimination. More explicitly, Teel 
(1952) has shown that in selective 
learning, in which correct responses 
are reinforced and incorrect responses 
are nonreinforced or extinguished, 
rats under high drive (food depriva- 
tion) require a greater number of 
trials to reach a criterion of learning 
than rats under low drive. One can- 
not predict these facts from the 
DX(sHr—-sIr) formula. In fact, 
just the opposite outcome would be 
predicted for the Teel experiment. 
With human subjects, Hilgard, Jones, 
and Kaplan (1951) found that high 
anxiety subjects (on Taylor Manifest 
Anxiety scale) had greater difficulty 
than low anxiety subjects in forming 
a discriminatory CR. It is well- 
established that anxious subjects de- 
velop simple eyeblink CRs more 
readily than nonanxious subjects. 
(This relationship has not been found 
to hold, however, for autonomic 


CRs.) 


Interpreting anxiety as a 
drive, both sets of findings are con- 


sistent with Hull, but not with 
DX(sHr—-sIr). An experiment by 
Spence and Farber (1954) found that 
the difference between high and low 
anxious subjects in forming a dis- 
criminatory response showed up only 
on the S? but not on the S4. That is, 
D (anxiety) seemed to affect only the 
CS (i.e., S?) associated with rela- 
tively greater sHr and not the CS 
(i.e., S*) associated with relatively 
greater sIr. Spence interprets this 
finding as evidence that D interacts 
with excitation (sHr) but not with 
inhibition (sJx). 

In a well-established discrimina- 
tion, in which S? and S4 are rela- 
tively far apart on the stimulus gen- 
eralization gradient, and in which 
relatively more sHr than slp has 
been built up to S? than to S4, and 
relatively more slr built up to S4 
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than to S?, we would predict from 
DX(sHr—-sIr) an improvement in 
the discrimination with an increase 
in drive. That is, the ratio of number 
of responses to S? to number of re- 
sponses to S4 should increase, since 
response to S? is increased by 
DXsHnpr, and inhibition of response 
to S4 is increased by DXsIpr. Dins- 
moor (1952) performed an experi- 
ment bearing on this point. A simple 
discrimination habit was well-estab- 
lished in rats in the Skinner box, with 
S” being the presence of light and S4 
being total darkness. When D was 
increased to varying degrees by food 
deprivation, the number of responses 
per unit of time to both S? and S4 
increased, but the ratio of S? and S4 
responses remained exactly the same 
at seven different degrees of hunger. 
In short, the discrimination was not 
improved by an increase in D. 
Though Hull’s theory is not suf- 
ficiently quantified to have precisely 
predicted the outcome of this par- 
ticular experiment (because absolute 
levels of D and sHe as well as the 
jnd’s between S? and S4 must be 
taken into account), at least the re- 
sult is consistent with the (DXsHr) 
— slr formulation. 

There is other experimental evi- 
dence, however, which suggests that 
both the Hullian and the revised for- 
mulations are inadequate to explain 
the effects of drive on discrimination 
learning. A number of studies have 
found no relationship at all between 
drive and proficiency in selective 
learning or solving discrimination 
problems (Meyer, 1951; Miles, 1959; 
and a number of doctoral disserta- 
tions reported by Spence, Goodrich, 
& Ross, 1959). Spence et al. (1959) 
have scrutinized the conflicting find- 
ings in this field with a view to dis- 
covering the reason for the lack of 
agreement between various investiga- 





INHIBITION IN HULL'S SYSTEM 


tions on the effect of drive on selec- 
tive learning and discrimination. 
They arrived at the hypothesis that 
performance in selective learning 
(such as learning a black-white dis- 
crimination) is independent of drive 
level when responses to the S? and 
S* are equated, but varies with drive 
when responses are not equated. 
They performed a set of experiments 
which supported this hypothesis. 
The results are inexplicable in terms 
of Hull’s theory or any of its revisions 
except that of Spence. These findings 
suggest that the growth of sHe is not 
a function of number of reinforced 
responses, as in Hull’s system, but isa 
function merely of the number of 
responses, whether reinforced or not. 
The growth of inhibition is a function 
only of the number of nonreinforced 
trials. This formulation will account 
for the major finding of the experi- 
ment by Spence et al. (1959). But 
another aspect of their findings re- 


mains inexplicable in terms of any 


current theory of learning. When 
responses to S? and S4 were equated, 
an increase in drive increased the 
response strength to both the S? and 
S4. But when the rats were forced to 
respond twice as often to S” as to S4, 
an increase in drive increased the 
response strength to S? but decreased 
response strength to S4. Spence et al. 
concluded that 


the results of the two (experiments) are in 
fundamental disagreement so far as the effects 
of drive differences on the strength of nonrein- 
forced responses are concerned. It is perhaps 
obvious that we need to obtain much more 
knowledge than we now possess concerning 
the variables affecting the development of re- 
sponse decrement with nonreinforcement. 
Unfortunately, this problem has been badly 
neglected in conditioning experiments with 
the consequence that such an empirically 
based theory as the present one [i.e., Spence’s 
theory] is weakest in this area (p. 15). 


Though the present state of our 
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knowledge in this area does not per- 
mit any definite conclusion regarding 
the effects of drive on discrimination 
learning, it appears that no current 
theory is able to comprehend all the 
relevant facts now available. 

But now let us ask: What happens 
when a discrimination is extinguished 
under various levels of drive? Cau- 
tela (1956) trained rats in a discrimi- 
nation under 23 hours’ food depriva- 
tion and then extinguished the dis- 
criminative response under 0, 6, 12, 
23, 47, and 71 hours’ deprivation. 
The criterion of extinction was failure 
to respond to either S? or S4 within 3 
minutes. Many more responses were 
required for extinction under high 
drive levels (23, 47, or 71 hours’ 
deprivation) than under low drive (0, 
6, or 12 hours). This result can be 
predicted from DX sHr—sIr. On the 
other hand, it is difficult to see why a 
change in drive should have any 
effect on the number of responses to 
extinction if sHr and slp are both 
increased or decreased proportion- 
ately by changes in D, as stated in the 
revised formula. 


D—TIr 


Since Hull referred to reactive 
inhibition (Jz) as a ‘negative drive,” 
he has been accused of logical incon- 
sistency for adding a drive to a habit 
(i.e., ZJr+sZr) and the suggested 
remedy has been the obvious one, 
viz., to subtract Jr from D. But pre- 
dictions from this formulation lead to 
empirical embarrassment. For ex- 
ample, when extinction is carried out 
under massed trials, and, after a 
period of rest, there is some spon- 
taneous recover, we must assume, 
according to the sEg=(D—TIr) X(sHr 
—slIr) formulation, that D—Ip=0 
at the end of the first extinction 
period. For there would be no spon- 
taneous recovery if it were sH7r— slr 





292 


that had become equal to zero. Yet, 
according to Hullian theory (includ- 
ing the revisions), no behavior can 
occur unless D is greater than zero. 
And it is known that an animal at the 
end of extinction is far from being 
inactive. Only the extinguished CR 
becomes inactive, while other behav- 
ior in the animal’s repertoire is im- 
mediately evident. Theoretically 
this could not be so if the drive com- 
ponent in the equation for reaction 
potential were zero. 

Experimental evidence contradict- 
ing D—Tp is presented by Hull (1952, 
p. 50). A rat is trained to press either 
of two bars in different locations in a 
Skinner box to obtain food. During 
extinction the rat alternates its re- 
sponse from one bar to the other. Jr 
does not have to dissipate before the 
alternate bar can be pressed. This 
strongly suggests that Jr» must be 
associated with the particular re- 
sponse, rather than cause a diminu- 
tion in the total drive state, which in 
the Hullian system is an amalgam of 
all the organic needs of the moment 
and their associated ‘‘drive stimuli’’ 
(Sp). 

In anexperiment highly relevant to 
this point, Smith and Hay (1954) 
took advantage of the great sensi- 
tivity to changes in drive of rate of 
responding in the Skinner box. As 
soon as operant conditioning had led 
to a stable response rate, a discrimi- 
natory stimulus was introduced, the 
S? always being reinforced, the S4 
never. During the learning of the dis- 
crimination, the number of responses 
to S? increased while the number of 
responses to S4 decreased, but the 
rate of responding remained constant. 
If the extinction of S4 had involved 
D—TIpr, there should have been the 
decrease in over-all response rate 
which is associated with lowered 
drive. On the other hand, this finding 
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is entirely consistent with Hull's 
formulation. 


IrXslr 


Here we have a formulation which, 
if the rules of algebra are followed 
religiously in manipulating Hullian 
variables, leads to a paradox—a 
positive addition to reaction poten- 
tial resulting from the interaction of 
two inhibitory variables. Jones 
(1958) even goes on to say that the 
paradoxical outcome of Jr Xslpr in- 
creasing sEr might explain the 
“ultraparadoxical effect’’ described 
by Pavlov (1927). This might be 
called explanation by clang associa- 
tion.’ It is difficult for the writer to 
understand why Jones and other 
revisers have so gratuitously regarded 
the minus sign as being permanently 
attached to Jr and sJr. Though these 
quantities are subtracted from posi- 
tive reaction potential, the negative 
sign is not necessarily an inherent 
part of these inhibition variables. 
Even if Jr and slr were multiplica- 
tively related, there is no reason why 
their product could not be subtracted 
from the positive reaction potential. 

The empirical evidence regarding 
the IrXslpr interaction is far from 
satisfactory, for there is always an 
“‘out’’ via the possible interaction of 
all the other intervening variables in 
the system. But in terms of sheer 
plausibility—and that is all one can 


* The ‘paradoxical’ and “ultraparadoxical” 
effects observed by Pavlov, in which a weaker 
intensity of the CS will elicit a CR that had 
been extinguished to a stronger intensity of 
the CS, are probably best explained in terms 
of a generalization gradient on the stimulus 
intensity dimension. Because of the gradient, 
extinctive inhibition built up to a CS of one 
intensity will not be sufficient to inhibit the 
CR to a CS of a different intensity, even 
though it be weaker. Or the effect may be ex- 
plained as disinhibition caused by a “novel’”’ 
stimulus—novel because the intensity is 
weaker than that of the original CS. 
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: goon at present—it must be said that 
IrnXsIr is a weak formulation. The 
only relevant evidence comes from 
experiments on motor learning, the 
one area in which there are rather 
clear-cut operational definitions of 
what constitutes Zp and slr. In 
general, performance decrement that 
dissipates during rest is identified 
with Jr; the decrement that still 
remains after rest is identified with 
slr. 

Duncan (1951) gave two groups of 
subjects massed and distributed prac- 
tice on the pursuit rotor. During this 
5-minute practice period, the massed 
group presumably would develop 
more Jpg and hence more sl/r. Then 
both groups were allowed 10 minutes 
of rest, so that at the beginning of the 
postrest trials, nearly all Jp should 
have dissipated, leaving the two 
groups differing only in sJr. The 
postrest trials were massed for both 
groups. Here exist the very condi- 
tions which should allow an ZJreX slr 
interaction to show itself. If there 
were an interaction, the postrest 
performance curves of the two groups 
should diverge. In fact, they did not 
diverge, or converge, but ran exactly 
parallel throughout the postrest trials, 
which suggests an additive rather than 
multiplicative relationship between 
Ir and slr. There are certain weak- 
nesses and peculiarities in Duncan's 
study (for example, it could be argued 
that the 5 minutes’ practice was not 
sufficient to attain the threshold of Jr 
necessary for the development of s/r, 
the evidence for which has been pre- 
sented by Kimble, 1950); but on the 
whole, it favors Hull’s formulation 
regarding inhibition more than it 
favors those formulations which in- 
volve IrXslIr. Another study by 
Starkweather and Duncan (1954) was 
essentially the same as the previous 
experiment except that the massed 
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group was given more prerest prac- 
tice so that performance on the first 
postrest trial would be the same for 
both massed and distributed groups. 
The rest period was 24 hours. Again, 
when both groups were given massed 
practice after the rest, their perform- 
ance curves were approximately par- 
allel, suggesting that there is no 
interaction between Jz and slr. It is 
possible to argue from some of the 
evidence in this study, however, that 
the presence of s/rz was not clearly 
demonstrated.‘ 

Better evidence is presented by 
Bourne and Archer (1956). Groups 
trained under massed and distributed 
practice on the pursuit rotor were 
given 5 minutes’ rest, and then all 
groups performed under massed con- 
ditions. The performance curves 
converged in the postrest period. But 
the convergence consisted of the 
performance of the previously dis- 
tributed group reducing to that of the 
massed group. If the Jr X slr formu- 
lation were correct, the result should 
have been just the opposite, with the 
previously massed group showing an 
increase up to the level of the distrib- 
uted group. The prerest practice 
was more prolonged in this study 
than in Duncan’s, and it can be 
argued that there was a sufficient 
amount of sl/r generated to permit 
the IpXsIpr to show itself. Yet, in 
another motor learning experiment 
specially designed to determine if 
there was an interaction between Jr 
and sl/r, Bowen, Ross, and Andrews 
(1956) failed to find any evidence of 
interaction. So while the evidence is 


not definitive on this point, the pre- 
ponderance of it does not favor the 


‘ It seems fairly certain that the concept of 
sIpz invoked to explain decremental phenom- 
ena in motor learning could not represent the 
same process as the s/Jz involved in experi- 
mental extinction. 
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IrnXsIr formulation. The issue, 
however, does not seem beyond a 
clear-cut experimental test. For 
example, in the Jones revision DXsIz 
would always have to be greater than 
IrXsIr, because there can be no 
performance when D is equal to or 
less than Jp. If this were true, a 
person practicing on the pursuit rotor 
over a long period should finally 
become unable to perform, since sl/r 
would continue to grow and inhibit 
performance. After Jp had dissi- 
pated, DXsIr would approach or 
equal D X sHp, and the subject would 
be unable to perform the pursuit task. 
Gleitman, Nachmais, and Neisser 
(1954) were the first to point out this 
consequence with respect to Hull’s 
formulation. As far as the writer 


knows, no one has ever found this 
kind of “extinction”’ of the pursuit 
rotor skill. Subjects have been known 
to practice the pursuit task day after 
day for months, long after having 
reached an asymptote for time on 


target, yet they show no loss of the 
skill. Hull’s formula, on the other 
hand, can get around this problem, 
the arguments of Gleitman et al. 
(1954) notwithstanding. If sHpr and 
sIr both reach an asymptote (Hull, 
1951), extinction will have occurred 
when slre=DXsHrp. An increase in D 
will raake it possible for D X sHp to be 
greater than the symptote of s/pr, so 
that extinction need never occur if D 
remains sufficiently high. Indeed, 
there are instances (Solomon 
& Wynne, 1954) of absence of extinc- 
tion in escape and avoidance training 
in which the drive is a very strong 
shock-induced fear reaction. 

The unlikely prediction made from 
Hull’s theory by Gleitman et al. 
(1954) that any response, even though 
always reinforced, would eventually 
extinguish if it were repeated often 
enough was directly tested in experi- 
ments by Calvin, Clifford, Clifford, 


ARTHUR R. JENSEN 


Bolden, and Harvey (1956) and 
Kendrick (1958). Their studies differ 
in a few details of experimental pro- 
cedure. Essentially they ran rats 
down a long alleyway at the end of 
which the rats received reinforcement 
on every trial. After some hundreds of 
trials (spread over many days) all the 
rats ceased running down the alley; 
they would not leave the starting box 
for a specified period of time desig- 
nated as the criterion for ‘‘complete”’ 
extinction. Though this outcome 
lends support to Hull’s theory, other 
interpretations are certainly possible 
(see Mowrer, 1960, pp. 426-432; 
Prokasy, 1960). The results of the 
Calvin et al. and Kendrick experi- 
ments may well be due to peculiari- 
ties of the experimental procedure. If 
not, one should expect “extinction 
with reinforcement” to occur in many 
other kinds of performance, such as a 
rat’s bar pressing or a pigeon’s peck- 
ing in a Skinner box, and in many 
types of repetitious motor tasks. 

One experiment is highly relevant 
to theoretical predictions regarding 
the effects of drive on motor learning. 
Wasserman (1951), using a motor 
learning task (alphabet printing) 
found that high motivation resulted 
in performance which was signifi- 
cantly superior to that of low motiva- 
tion (in both massed and distributed 
practice groups), the difference be- 
coming progressively greater as prac- 
tice continued. The Jones revision 
would predict just the opposite. 
Since D must always be greater than 
Ir, DXsIr would result in greater 
performance decrement for the highly 
motivated group. The motivation in 
this experiment was controlled by the 
instructions given to the subjects, one 
group being task-oriented, the other 
ego-oriented. 


IrXsHe 


This formulation of an interaction 
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between reactive inhibition and habit 
strength implies that the decremental 
effects on performance caused by the 
conditions producing Jr (effort and 
rate of response) will be greater for 
strong than for weak habits. This is 
patently incorrect, since it is known 
that there is a positive correlation 
between number of reinforced re- 
sponses, of which sHp is a function, 
and the number of responses emitted 
during extinction. The JreXsHr 
formulation would predict just the 
opposite, i.e., a megative correlation 
between number of reinforcements 
and number of responses to extinc- 
tion. This conclusion is not weakened 
by the fact (for example, Reid, 1953) 
that in learning to make a discrimina- 
tion reversal the animals that have 
had a greater number of prereversal 
trials learn the reversal more quickly. 
This phenomenon may be interpreted 
in terms of the animal’s also over- 
learning the act of making a discrimi- 
nation (in addition to learning to 
respond differentially to the S? and 
S‘), which facilitates the learning of 
the reversal. 


sHrXsIr 


This formulation, derived from 
Iwahara (1957), is subject to the 
same criticism just made in the case 
of IrXsHr. It implies that the 
stronger the habit, the more quickly 
it should ex.’ nguish, which certainly 
is not true. 

K-I, 

The suggestion of Woodworth and 
Schlosberg (1954), that total inhibi- 
tion (Jre=Ir+sIr) be subtracted 
from incentive motivation, K (a func- 
tion of amount of reinforcement), 
seems plausible, in that extinction 
involves the withdrawal of incentive. 
Within the total Hullian formulation, 
however, the Woodworth and Schlos- 
berg suggestion meets with the same 
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difficulties pointed out in the two 
previous cases. Thus: 


sEr= DX(K—Ir—slIr)XsHr 
In expanded form: 
sEx= DX K—DXIr—DXslx 
X DXsHrXKXsHr 
—sHrXIr—sHr—sler 


Thus we have again all of the ele- 
ments that have already been criti- 
cized. Spence (1956) has argued, on 
the basis of experimental findings, 
that D and K are additive rather than 
multiplicative as in Hull. But here 
again the defects of the Woodworth 
and Schlosberg suggestion of K—J,z 
are evident. 


sEr=(D+K—Ipr)XsHr 


Expanded: 
sEr= DXsHr+KXsHe—sHeXIr 


The last term in the expanded form- 
ula again meets with the same diff- 
culty pointed out above. It must be 
concluded that the K—J, formula- 
tion is not an improvement on Hull or 
Spence. 


SUMMARY 


Several attempts to reformulate 
Hull’s theory with respect to the 
inhibition postulates have been criti- 
cized. Because of the limitations of 
both Hull and his revisers in the 
exact quantification of intervening 
variables, much of the choice between 
alternative versions of the theory 
must be made on the basis of plaust- 
bility of congruence with empirical 
findings rather than of prediction of 
these findings in the rigorous sense of 
the term. All of the attempted re- 
visions to date, with the possible 
exception of that of Spence, have 
serious shortcomings in the light of 
experimental evidence. They cannot, 
therefore, be regarded as improve- 
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ments over Hull’s original formula- 
tion of reaction potential. Advances 
will be made, not by the mere alge- 
braic manipulation of Hull’s inter- 
vening variables—the method that 
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characterizes the present attempts— 
but by the postulation and quantifi- 
cation of new intervening variables, 
along with the experimental investi- 
gation of their interactions. 
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The operation of reliable response 
sets or stylistic consistencies has been 
frequently noted on personality and 
attitude scales with a true-false or 
agree-disagree format (cf. Cronbach, 
1946, 1950; Fricke, 1956; Messick & 
Jackson, 1958). It has recently been 
conjectured (Jackson & Messick, 
1958) that the major common factors 
in personality inventories of this type 
are interpretable primarily in terms 
of such stylistic consistencies rather 
than in terms of specific item content. 
The present paper attempts to anno- 
tate the influence of two response 
styles, the tendency to agree or ac- 
quiesce and the tendency to respond 
in a desirable way, using the Minne- 


sota Multiphasic Personality Inven- 


tory (MMPI) as an example of 
inventories with this general response 
form. In particular, a high correla- 
tion will be noted between factor 
loadings on the largest factor, as 
obtained in several published factor 
analyses of the MMPI, and certain 
indices of acquiescence. 

Barnes (1956b), in evaluating the 
Berg (1955) deviation hypothesis on 
the MMPI, found that the tendency 
to answer atypically or deviantly 
“true’’ was highly correlated with 


1 This study is part of a larger project on 
stylistic determinants in clinical personality 
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graciously supplying scoring keys for his 
“‘pure’”” MMPI scales and Philip E. Slater for 
making available his factor analyses of the 
MMPI. 
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scores on the psychotic scales, and 
the tendency to answer atypically 
‘false’ was highly correlated with the 
neurotic triad. This result is consist- 
ent with the fact, noted by Cottle and 
Powell (1951) anc others (Barnes, 
1956b; Fricke, 1956), that a large 
proportion of MMPI psychotic items 
are keyed true and a large proportion 
of neurotic items keyed false, suggest- 
ing that differential tendencies to 
respond atypically “true’’ and ‘‘false”’ 
might have been involved in the dis- 
crimination of criterion groups upon 
which the scoring keys were based. 
Barnes (1956a) also pointed out a 
marked similarity between the corre- 
lations of MMPI scales with these 
two deviant response tendencies and 
factor loadings for the scales on 
the two major factors reported by 
Wheeler, Little, and Lehner (1951); 
he concluded that the number of 
atypical true answers is a ‘pure 
factor test’’ of the first or ‘‘psychotic”’ 
factor and that the number of deviant 
false answers has a high loading on 
the second or “‘neurotic’’ factor. The 
two major MMPI factors obtained 
by Welsh (1956) also displayed a 
similar pattern of loadings, and it is 
noteworthy that the “‘pure factor” 
reference scale A which Welsh devel- 
oped for his first or ‘‘anxiety’’ factor 
had 38 out of 39 items keyed true, 
while the reference scale R for the 
second or “repression’”’ factor had all 
40 of its items keyed false. 

In view of the striking similarity 
between the effects of consistent 
tendencies to respond “true’’ and 
“false’’ and patterns of factor load- 
ings obtained in two studies of 
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MMPI scales, all factor analyses of 
the MMPI readily available in the 
literature were reviewed, in order to 
evaluate the possible relationship 
between each scale’s factor loading on 
the major factor and an index of its 
potential for reflecting acquiescence. 
The particular index of acquiescence 
used was the proportion of items 
keyed true on each scale, which, 
assuming that thé acquiescence-evok- 
ing properties of items are uniform 
over all MMPI scales, can be consid- 
ered to reflect the extent to which 
total scores on a scale are influenced 
by consistent tendencies to respond 
“‘true.’’ High scores on a scale with a 
large proportion of items keyed true 
would thus be assumed to reflect a 
general tendency to acquiesce, in 
addition, of course, to the contribu- 
tion of other stylistic tendencies 


and of systematic content responses. 
Jackson (1960) used this index to 
evaluate the effects of acquiescence 


on the California Psychological In- 
ventory, and Voas (1958) used the 
proportion of items keyed false as a 
criterion for constructing response 
bias scales. Voas (1958) also esti- 
mated loadings for scales from the 
MMPI and the Guilford-Zimmerman 
Temperament Survey on a factor 
marked by two measures of the tend- 
ency to respond ‘“‘false’’ and found 
that these loadings correlated .86 
with the proportion of items keyed 
false on each scale. These findings 
support the use of the index in the 
present context. 

Factor loadings for MMPI scales 
were obtained from eight studies 
by Abrams (1949, summarized by 
French, 1953), Cook and Wherry 
(1950), Cottle (1950), Tyler (1951), 
Wheeler, Little, and Lehner (1951), 
Welsh (1956), Slater (1958), and Kas- 
sebaum, Couch, and Slater (1959). A 
fairly uniform finding from these 
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studies is that only two major factors 
and two or three minor ones are 
necessary to account for interrela- 
tions among the scales. Spearman 
rank correlations were computed 
between loadings on the largest factor 
in each study and the proportion of 
items keyed true on each scale; the 
results are summarized in Table 1. In 
some of the factor analyses, values 
were not reported for scales with 
small loadings on the factor, so 
in computing correlation coefficients 
these scales were considered to be 
tied at an appropriate rank below 
scales with reported positive loadings 
and above scales with reported nega- 
tive loadings. Corrections for ties (cf. 
Siegel, 1956) were computed for two 
of the studies with the most scales 
tied at the same rank (Wheeler, 
Little, & Lehner’s normal sample and 
Tyler’s sample), but the coefficients 
changed only .01. 

Of 11 different subject samples 
represented in these eight studies, 
significant correlations were obtained 
for 8 of them, four of the coefficients 
exceeding .85. These strikingly con- 
sistent findings indicate that in most 
of these studies the largest factor on 
the MMPI is interpretable in terms 
of acquiescence. In evaluating the 
few apparently inconsistent results, 
it is important to note that for 
Abrams’s (1949) neurotic sample, the 
correlation with the largest factor 
was —.15, but with the second largest 
it was .52. Also, in Tyler’s (1951) 
study the correlation with the largest 
rotated factor was .33, but with the 
unrotated first centroid it was .52, 
p<.05. These findings suggest that 
for those studies in which the corre- 
spondence between the proportion of 
items keyed true and the factor 
loadings was not close, the factor 
structures could have been rotated to 
produce a higher correlation. Ana- 
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cE 1 


Factor LOADINGS ON THE LARGEST MMPI 
“TRUE” ON EACH SCALE 





Scales Included 


Abrams, 1949 11 scales: L, F, Hs, D, Hy, Pd, 


Pt, Sc, Ma 


Ms, 





11 scales: L, F, Hs, D, Hy, Pd, 


Pt, Sc, Ma 


Mf, 


Cook & Wherry, 
1950 





Cottle, 1950 11 scales: L, F, Hs, D, Hy, Pd, Mf, 
Pi s 


, 5c, Ma 


Tyler, 1951 





15 scales: Hs, D, Hy, Pd, Mf, Pa 
Se, Mo, Si, St, Pr, Ac, Re, Do 

12 scales: L, K, F, Hs, 
Pa, Pt, Sc, Ma 


Ww heeler, Little, 
Lehner, 1951 


& D, Hy, Pd, 


Welsh, 1956 = D, 


11 pure | a 
Mf Si’ 


Hs’ 
oft, M 


Sc’, Ma’, 


11 pure scales plus ‘ Gm, Je, R 

43 scales: L, F, K, Hs, D, Hy, Pd, 
Pa, Pi, Sc, Ma. Si, 
R, Im, Pr, To, C, P, Sp, Rp, Sy, 
St, Lp, Do, Es, le, Ac, Ai, O-i, 
Ne, Ca, Pl, Ht, Cht, Zi, Zs 


32 scales: L, F, K, Hs, e Hy, 
P ‘ 


Slater, 1958 


Kassebaum, Couch, 
& Slater, 1959 


Pd, 





Rp, R, A, Dp, To, O1 





lytical procedures similar to the 
computation of B weights in multiple 
correlation analysis are available 
(Mosier, 1939) for rotating to maxi- 
mize the correlation between a factor 
and a criterion, which in this case 
would be a vector of proportions of 
true items. However, an adequate 
application of this technique requires 
loadings for all the scales on the fac- 
tors under consideration, and for 
those studies providing this informa- 
tion (e.g., Welsh, 1956) there was 
usually little need to rotate. 

Another consideration which sug- 
gests that a rotation of axes might 
clarify the role of acquiescence on the 
MMPI is the fact that scales with 
high loadings on the second largest 
MMPI factor usually tend to have a 
high proportion of false items in their 


Hy’, Pd 


Nm, Dp, Fm, A, 


>, Fm, | 


| Sample | p 
Pa, | 7 normal mz ule 907** 
veterans 


~ .148 (largest factor) 
.516 (2nd largest) 


neurotic male 
veterans 
Pa, 111 male naval sub- | 605* 
marine candidates 


Pa, | 400 male veterans 916** 


female graduate 
students 


, Pt, | 107 


Mf, | 112 male college stu 


dents 
110 male neuropsy- | 
chiatric patients | 


| 150 male VA general 
hospital patients 
Same 150 males 
Mf, | 102 aged males 


Re, | 


Lb, | 109 aged females 


Mf, | 


| 160 Harvard College 
Ai, 

| 

i 


freshmen 


keys. Kassebaum, Couch, and Slater 
(1959) noticed this in their factor 
results and suggested that their 
second factor partly reflected a gen- 
eral tendency to respond “false.” 
Although correlations between the 
proportion of items keyed true and 
loadings on the second MMPI factor 
are usually not nearly as high as cor- 
relations with the first factor, some 
significant coefficients occur; e.g., the 
correlation between the proportion of 
items keyed true and loadings on the 
second factor in the study by Kasse- 
baum, Couch, and Slater (1959) was 
—.44, p<.05 with 30 df, and in 
Welsh’s (1956) study it was —.64, 
p<.05 with 13 df. 

This result consistent with 
Barnes’ (1956a) finding of a corre- 
spondence between atypical true 
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answers and the first MMPI factor 
and atypical false answers and the 
second factor. Since these two factors 
are usually orthogonal, this corre- 
spondence might be considered evi- 
dence for two relatively independent 
response biases, one a tendency to 
agree and the other to disagree. 
Such a contention is consistent with 
Barnes’ (1956b) finding of a correla- 
tion of .11 between deviant responses 
answered “true” and ‘“‘false’’ and 
with the fact that Welsh’s (1956) A 
and R scales are usually only slightly 
negatively correlated. Although these 
results cannot be accounted for by a 
simple response set of acquiescence, it 
is not necessary to postulate two 
independent sets to agree and to 
disagree. As has been pointed out 


(Jackson & Messick, 1958), all that is 
required to account for the findings is 
the operation of at least one other 
factor in conjunction with acquies- 
cence. Thus, the A scale can have a 
high positive loading on an acquies- 


cence factor and the R scale a high 
negative loading, yet the two scales 
could be uncorrelated if they both 
had positive, or negative, loadings on 
some other dimension. Other factors 
which could moderate the operation 
of acquiescence on the MMPI might 
be specific content dimensions or 
some other response style. As previ- 
ously suggested (Jackson & Messick, 
1958), a particularly likely candidate 
for such a role is the stylistic tend- 
ency to respond in a desirable way. 

Possible influences on M M PI scores 
of a set to respond desirably have 
been widely documented (cf. De Soto 
& Kuethe, 1959; Edwards, 1957; 
Fordyce, 1956; Hanley, 1956, 1957; 
Jackson & Messick, 1958; Taylor, 
1959; Wiggins & Rumrill, 1959). 
Fordyce (1956), for example, has 
noted a marked similarity between 
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loadings on the largest MMPI factor 
from Wheeler, Little, and Lehner’s 
(1951) psychiatric sample and corre- 
lations of MMPI scales with a meas- 
ure of desirability. In fact, the rank 
correlation between the loadings and 
the correlation coefficients is approxi- 
mately —.75, and since the propor- 
tion of items keyed true on each 
MMPI scale correlates only about 
—.50 with the desirability coeffi- 
cients, it seems likely that a combina- 
tion of desirability and acquiescence 
would lead to even better prediction 
of the factor (cf. Messick, 1959). 
Although this and some other re- 
ported relationships are somewhat 
equivocal because the measures of 
desirability used were partially con- 
founded with acquiescence, e.g., Ed- 
wards’ SD scale and Hanley’s Ex 
scale, high correlations have also been 
reported between MMPI scales and 
desirability measures having a bal- 
anced number of true and false items 
(Edwards, 1957; Hanley, 1957; Wig- 
gins & Rumrill, 1959). 

In an attempt to take these find- 
ings into account, it is suggested that 
the acquiescence-evoking properties 
of items are not, as assumed above, 
uniform over all scales, but that 
acquiescence is elicited differentially 
as a function, perhaps, of specific 
item content, of the clarity or ambi- 
guity with which the content is 
stated, and in particular of the per- 
ceived desirability of the statement. 
In the extreme, it is suggested that 
the two major factors usually found 
for the MMPI may be rotated into 
positions interpretable as two re- 
sponse styles—the tendency to ac- 
quiesce and the tendency to respond 
desirably. The negative poles of these 
dimensions would be the tendencies 
to disagree and to respond undesir- 
ably, respectively. Response vari- 
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ance on MMPI scales would then be 
primarily a function of these two styl- 
istic components in various weighted 
proportions. Studies including inde- 
pendent marker variables for the two 
styles are of course required to 
identify the factor positions. Much 
research is also needed into the pre- 
cise nature of the set to respond 
desirably, particularly in view of 
three complicating results: (a) the 
finding of consistent individual differ- 
ences in judgments of desirability 
(Messick, 1960); (0) the distinction 
between personal and social desira- 
bility (Borislow, 1958; Rosen, 1956); 
and (c) the differentiation between a 
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tendency to endorse certain desirable 
items which exhibit large mean shifts 
under desirability instructions and 
the tendency to endorse other desir- 
ble items which presumably reflect a 
group norm (Voas, 1958; Wiggins, 
1959). 

In conclusion, the findings offer 
clear evidence that acquiescence, as 
moderated by item desirability, plays 
a dominant role in personality inven- 
tories like the MMPI. Focused 


empirical investigations are required 
to develop a refined interpretation of 
these and other stylistic consistencies 
in terms of personality organization 
and psychopathology. 
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SCALES AND STATISTICS: 
PARAMETRIC AND NONPARAMETRIC! 
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The recent rise of interest in the use 
of nonparametric tests stems from 
two main sources. One is the concern 
about the use of parametric tests 
when the underlying assumptions are 
not met. The other is the problem of 
whether or not the measurement scale 
is suitable for application of paramet- 
ric procedures. On both counts 
parametric tests are generally more in 
danger than nonparametric tests. 
Because of this, and because of a 
natural enthusiasm for a new tech- 
nique, there has been a sometimes 
uncritical acceptance of nonparamet- 
ric procedures. By now a certain 
degree of agreement concerning the 
more practical aspects involved in the 


choice of tests appears to have been 
reached. However, the measurement 
theoretical issue has been less clearly 


resolved. The principal purpose of 
this article is to discuss this latter 
issue further. For the sake of com- 
pleteness, a brief overview of practi- 
cal statistical considerations will also 
be included. 

A few preliminary comments are 
needed in order to circumscribe the 
subsequent discussion. In the first 
place, it is assumed throughout that 
the data at hand arise from some sort 
of measuring scale which gives nu- 
merical results. This restriction is 
implicit in the proposal to compare 
parametric and nonparametric tests 


1 An earlier version of this paper was pre- 
sented at the April 1959 meetings of the 
Western Psychological Association. The au- 
thor’s thanks are due F. N. Jones and J. B. 
Sidowski for their helpful comments. 


since the former do not apply to 
strictly categorical data (but see 
Cochran, 1954). Second, parametric 
tests will mean tests of significance 
which assume equinormality, i.e., 
normality and some form of homo- 
geneity of variance. For convenience, 
parametric test, F test, and analysis 
of variance will be used synony- 
mously. Although this usage is not 
strictly correct, it should be noted 
that the ¢ test and regression analysis 
may be considered as special applica- 
tions of F. Nonparametric tests will 
refer to significance tests which make 
considerably weaker distributional 
assumptions as exemplified by rank 
order tests such as the Wilcoxon 7, 
the Kruskal-Wallis H, and by the 
various median-type tests. Third, the 
main focus of the article is on tests of 
significance with a lesser emphasis on 
descriptive statistics. Problems of 
estimation are touched on only 
slightly although such problems are 
becoming increasingly important. 

Finally, a word of caution is in 
order. It will be concluded that 
parametric procedures constitute the 
everyday tools of psychological sta- 
tistics, but it should be realized that 
any area of investigation has its own 
statistical peculiarities and that gen- 
eral statements must always be 
adapted to the prevailing practical 
situation. In many cases, as in pilot 
work, for instance, or in situations in 
which data are cheap and plentiful, 
nonparametric tests, shortcut para- 
metric tests (Tate & Clelland, 1957), 
or tests by visual inspection may well 
be the most efficient. 
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PRACTICAL STATISTICAL 
CONSIDERATIONS 


The three main points of compari- 
sou between parametric and non- 
parametric tests are significance level, 
power, and versatility. Most of the 
relevant considerations have been 
treated adequately by others and 
only a brief summary will be given 
here. For more detailed discussion, 
the articles of Cochran (1947), Sav- 
age (1957), Sawrey (1958), Gaito 
(1959), and Boneau (1960) are espe- 
cially recommended. 

Significance level. The effects of 
lack of equinormality on the signifi- 
cance level of parametric tests have 
received considerable study. The 


two handiest sources for the psy- 
chologist are Lindquist’s (1953) cita- 
tion of Norton’s work, and the recent 
article of Boneau (1960) which sum- 
marizes much of the earlier work. 
The main conclusion of the various 
investigators is that lack of equi- 


normality has remarkably little effect 
although two exceptions are noted: 
one-tailed tests and tests with con- 
siderably disparate cell m’s may be 
rather severely affected by unequal 
variances.” 

A somewhat different source of 
perturbation of significance level 
should also be mentioned. An over- 
all test of several conditions may 
show that something is significant 
but will not localize the effects. As is 
well known, the common practice of ¢ 
testing pairs of means tends to in- 
flate the significance level even when 
the over-all F is significant. An 


* The split-plot designs (e.g., Lindquist, 
1953) commonly used for the analysis of re- 
peated or correlated observations have been 
subject to some criticism (Cotton, 1959; 
Greenhouse & Geisser, 1959) because of the 
additional assumption of equal correlation 
which is made. However, tests are available 
which do not require this assumption (Cotton, 
1959; Greenhouse & Geisser, 1959; Rao, 1952). 


analogous inflation occurs with non- 
parametric tests. There are para- 
metric multiple comparison proce- 
dures which are rigorously applicable 
in many such situations (Duncan, 
1955; Federer, 1955) but analogous 
nonparametric techniques have as 
yet been developed in only a few 
Cases. 

Power. As Dixon and Massey 
(1957) note, rank order tests are 
nearly as powerful as parametric 
tests under equinormality. Con- 
sequently, there would seem to be no 
pressing reason in most investiga- 
tions to use parametric techniques 
for reasons of power if an appropriate 
rank order test is available (but see 
Snedecor, 1956, p. 120). Of course, 
the loss of power involved in dichoto- 
mizing the data for a median-type 
test is considerable. 

Although it might thus be argued 
that rank order tests should be gen- 
erally used where applicable, it is to 
be suspected that such a practice 
would produce negative transfer to 
the use of the more incisive experi- 
mental designs which need para- 
metric analyses. The logic and com- 
puting rules for the analysis of vari- 
ance, however, follow a uniform pat- 
tern in all situations and thus provide 
maximal positive transfer from the 
simple to the more comple experi- 
ments. 

There is also another aspect of 
power which needs mention. Not in- 
frequently, it is possible to use exist- 
ing data to get a rough idea of the 
chances of success in a further related 
experiment, or to estimate the N re- 
quired for a given desired probability 
of success (Dixon & Massey, 1957, 
Ch. 14). Routine methods are avail- 
able for these purposes when para- 
metric statistics are employed but 
similar procedures are available only 
for certain nonparametric tests such 
as chi square. 
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Versatility. One of the most re- 
markable features of the analysis of 
variance is the breadth of its ap- 
plicability, a point which has been 
emphasized by Gaito (1959). For 
present purposes, the ordinary fac- 
torial design will serve to exemplify 
the issue. Although factorial designs 
are widely employed, their uses in the 
investigation and control of minor 
variables have not been fully ex- 
ploited. Thus, Feldt (1958) has 
noted the general superiority of the 
factorial design in matching or equat- 
ing groups, an important problem 
which is but poorly handled in cur- 
rent research (Anderson, 1959). 
Similarly, the use of replications as a 
factor in the design makes it possible 
to test and partially control for drift 
or shift in apparatus, procedure, or 
subject population during the course 
of an experiment. In the same way, 
taking experimenters or stimulus ma- 
terials as a factor allows tests which 
bear on the adequacy of standardiza- 
tion of the experimental procedures 
and on the generalizability of the re- 
sults. 

An analogous argument could be 
given for latin squares, largely re- 
habilitated by the work of Wilk and 
Kempthorne (1955), which are useful 
when subjects are given successive 
treatments; for orthogonal poly- 
nomials and trend tests for corre- 
lated scores (Grant, 1956) which give 
the most sensitive tests when the in- 
dependent variable is scaled; as well 
as for the multivariate analysis of 
variance (Rao, 1952) which is appli- 
cable to correlated dependent vari- 
ables measured on incommensurable 
scales. 

The point to these examples and to 
the more extensive treatment by 
Gaito is straightforward. Their anal- 
ysis is more or less routine when 
parametric procedures are used. 
However, they are handled inade- 
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quately or not at all by current non- 
parametric methods. 

It thus seems fair to conclude that 
parametric tests constitute the stand- 
ard tools of psychological statistics. 
In respect of significance level and 
power, one might claim a fairly even 
match. However, the versatility of 
parametric procedures is quite un- 
matched and this is decisive. Unless 
and until nonparametric tests are de- 
veloped to the point where they meet 
the routine needs of the researcher as 
exemplified by the above designs, 
they cannot realistically be con- 
sidered as competitors to parametric 
tests. Until that day, nonparametric 
tests may best be considered as use- 
ful minor techniques in the analysis 
of numerical data. 

Too promiscuous a use of F is, of 
course, not to be condoned since there 
will be many situations in which the 
data are distributed quite wildly. 
Although there is no easy rule with 
which to draw the line, a frame of 
reference can be developed by study- 
ing the results of Norton (Linquist, 
1953) and of Boneau (1960). It is 
also quite instructive to compare p 
values for parametric and nonpara- 
metric tests of the same data. 

It may be worth noting that one of 
the reasons for the popularity of non- 
parametric tests is probably the cur- 
rent obsession with questions of sta- 
tistical significance to the neglect of 
the often more important questions of 
design and power. Certainly some 
minimal degree of reliability is gen- 
erally a necessary justification for 
asking others to spend time in assess- 
ing the importance of one’s data. 
However, the question of statistical 
significance is only a first step, anda 
relatively minor one at that, in the 
over-all process of evaluating a set of 
results. To say that a result is sta- 
tistically significant simply gives 
reasonable ground for believing that 
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some nonchance effect was obtained. 
The meaning of a nonchance effect 
rests on an assessment of the design of 
the investigation. Even with judi- 
cious design, however, phenomena 
are seldom pinned down in a single 
study so that the question of replica- 
bility in further work often arises 
also. The statistical aspects of these 
these two questions are not without 
importance but tend to be neglected 
when too heavy an emphasis is 
placed on p values. As has been 
noted, it is the parametric procedures 
which are the more useful in both re- 
spects. 


MEASUREMENT SCALE 
CONSIDERATIONS 


The second and principal part of 
the article is concerned with the rela- 
tions between types of measurement 
scales and statistical tests. For con- 
venience, therefore, it will be as- 
sumed that lack of equinormality 
Since 


presents no serious problem. 
the F ratic remains constant with 
changes in unit or zero point of the 
measuring scale, we may ignore ratio 
scales and consider only ordinal and 


interval scales. These scales are de- 
fined following Stevens (1951). 
Briefly, an ordinal scale is one in 
which the events measured are, in 
some empirical sense, ordered in the 
same way as the arithmetic order of 
the numbers assigned to them. An 
interval scale has, in addition, an 
equality of unit over different parts 
of the scale. Stevens goes on to char- 
acterize scale types in terms of 
permissible transformations. Foran 
ordinal scale, the permissible trans- 
formations are monotone since they 
leave rank order unchanged. For an 
interval scale, only the linear trans- 
formations are permissible since 
only these leave relative distance 
unchanged. Some workers (e.g., 
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Coombs, 1952) have considered vari- 
ous scales which lie between the or- 
dinal and interval scales. However, 
it will not be necessary to take this 
further refinement of the scale typol- 
ogy into account here. 

As before, we suppose that we have 
a measuring scale which assigns num- 
bers to events of a certain class. It is 
assumed that this measuring scale is 
an ordinal scale but not necessarily 
an interval scale. In order to fix 
ideas, consider the following example. 
Suppose that we are interested in 
studying attitude toward the church. 
Subjects are randomly assigned to 
two groups, one of which, reads Com- 
munication A, while the other reads 
Communication B. The subjects’ 
attitudes towards the church are 
then measured by asking them to 
check a seven category pro-con rating 
scale. Our problem is whether the 
data give adequate reason to con- 
clude that the two communications 
had different effects. 

To ascertain whether the com- 
munications had different effects, 
some statistical test must be ap- 
plied. In some cases, to be sure, the 
effects may be so strong that the test 
can be made by inspection. In most 
cases, however, some more objective 
method is necessary. An obvious 
procedure would be to assign the 
numbers 1 to 7, say, to the rating 
scale categories and apply the F test, 
at least if the data presented some 
semblance of equinormality. How- 
ever, some writers on statistics (e.g., 
Siegel, 1956; Senders, 1958) would 
object to this on the ground that the 
rating scale is only an ordinal scale, 
the data are therefore not ‘truly 
numerical,’”’ and hence that the 
operations of addition and multipli- 
cation which are used in computing F 
cannot meaningfully be applied to 
the scores. There are three different 
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questions involved in this objection, 
and much of the controversy over 
scales and statistics has arisen from 
a failure to keep them separate. Ac- 
cordingly, these three questions will 
be taken up in turn. 

Question 1. Can the F test be ap- 

plied to data from an ordinal scale? It 
is convenient to consider two cases of 
this question according as the as- 
sumption of equinormality is satis- 
fied or not. Suppose first that equi- 
normality obtains. The caveat 
against parametric statistics has been 
stated most explicitly by Siegel 
(1956) who says: 
The conditions which must be satisfied... . 
before any confidence can be placed in any 
probability statement obtained by the use of 
the ¢ test are at least these:...4. The vari- 
ables involved must have been measured in at 
least an interval scale . . . (p. 19). (By permis- 
sion, from Nonparametric Statistics, by 
S. Siegel. Copyright, 1956. McGraw-Hill 
Book Company, Inc.) 


This statement of Siegel’s is com- 


pletely incorrect. This particular 
question admits of no doubt whatso- 
ever. The F (or ¢#) test may be 
applied without qualm. It will then 
answer the question which it was de- 
signed to answer: can we reasonably 
conclude that the difference between 
the means of the two groups is real 
rather than due to chance? The 
justification for using F is purely 
statistical and quite straightfor- 
ward; there is no need to waste space 
on it here. The reader who has 
doubts on the matter should postpone 
them to the discussion of the two 
subsequent questions, or read the 
elegant and entertaining article by 
Lord (1953). As Lord points out, 
the statistical test can hardly be 
cognizant of the empirical meaning of 
the numbers with which it deals. 
Consequently, the validity of a sta- 
tistical inference cannot depend on 
the type of measuring scale used. 
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The case in which equinormality 
does not hold remains to be consid- 
ered. We may still use F, of course, 
and as has been seen in the first part, 
we would still have about the same 
significance level in most cases. The 
F test might have less power than a 
rank order test so that the latter 
might be preferable in this simple two 
group experiment. However, insofar 
as we wish to inquire into the relia- 
bility of the difference between the 
measured behavior of the two groups 
in our particular experiment, the 
choice of statistical test would be 
governed by purely statistical consid- 
erations and have nothing to do with 
scale type. 

Question 2. Will statistical results be 
invariant under change of scale? The 
problem of invariance of result stems 
from the work of Stevens (1951) who 
observes that a statistic computed on 
data from a given scale will be invari- 
ant when the scale is changed accord- 
ing to any given permissible transfor- 
mation. It is important to be precise 
about this usage of invariance. It 
means that if a statistic is computed 
from a set of scale values and this 
statistic is then transformed, the 
identical result will be obtained as 
when the separate scale values are 
transformed and the statistic is com- 
puted from these transformed scale 
values. 

Now our scale of attitude toward 
the church is admittedly only an 
ordinal scale. Consequently, we 
would expect it to change in the 
direction of an interval scale in future 
work. Any such scale change would 
correspond to a monotone transfor- 
mation of our original scale since only 
such transformations are permissible 
with an ordinal scale. Suppose then 
that a monotone transformation of 
the scale has been made subsequent 
to the experiment on attitude change. 
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We would then have two sets of data: 
the responses as measured on the 
original scale used in the experiment, 
and the transformed values of these 
responses as measured on the new, 
transformed scale. (Presumably, 
these transformed scale values would 
be the same as the subjects would 
have made had the new scale been 
used in the original experiment, al- 
though this will no doubt depend on 
the experimental basis of the new 
scale.) The question at issue then 
becomes whether the same signifi- 
cance results will be obtained from 
the two sets of data. If rank order 
tests are used, the same significance 
results will be found in either case 
because any permissible transforma- 
tion leaves rank order unchanged. 
However, if parametric tests are 
employed, then different significance 
statements may be obtained from the 
two sets of data. It is possible to get a 
significant F from the original data 
and not from the transformed data, 
and vice versa. Worse yet, it is even 
logically possible that the means of 
the two groups will lie in reverse order 
on the two scales. 

The state of affairs just described is 
clearly undesirable. If taken uncriti- 
cally, it would constitute a strong 
argument for using only rank order 
tests on ordinal scale data and re- 
stricting the use of F to data obtained 
from interval scales. It is the purpose 
of this section to show that this con- 
clusion is unwarranted. The basis of 
the argument is that the naming of 
the scales has begged the psychologi- 
cal question. 

Consider interval scales first, and 
imagine that two students, P and Q, 
in an elementary lab course are as- 
signed to investigate some process. 
This process might be a ball rolling on 
a plane, a rat running an alley, or a 
child doing sums. The students 
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cooperate in the experimental work, 
making the same observations, except 
that they use different measuring 
scales. P decides to measure time 
intervals. He reasons that it makes 
sense to speak of one time interval as 
being twice another, that time inter- 
vals therefore form a ratio scale, and 
hence a fortiori an interval scale. Q 
decides to measure the speed of the 
process (feet per second, problems per 
minute). By the same reasoning as 
used by P, Q concludes that he has an 
interval scale also. Both P and Q are 
aware of current strictures about 

















A, A2 

Fic. 1. Temporal aspects of some process 
obtained from a 2X2 design. (The data are 
plotted as a function of Variable A with Vari- 
able B as a parameter. Subscripts denote the 
two levels of each variable. Note that Panel 
P shows an interaction, but that Panel Q does 
not.) 
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scales and statistics. However, since 
each believes (and rightly so) that he 
has an interval scale, each uses means 
and applies parametric tests in writ- 
ing his lab report. Nevertheless, 
when they compare their reports they 
find considerable difference in their 
descriptive statistics and graphs (Fig- 
ure 1), and in their F ratios as well. 
Consultation with a statistican shows 
that these differences are direct con- 
sequences of the difference in the 
measuring scales. Evidently then, 
possession of an interval scale does 
not guarantee invariance of interval 
scale statistics. 

For ordinal scales, we would expect 
to obtain invariance of result by using 
ordinal scale statistics such as the 
median (Stevens, 1951). Let us sup- 
pose that some future investigator 
finds that attitude toward the church 
is multidimensional in nature and 
has, in fact, obtained interval scales 
for each of the dimensions. In some 
of his work he chanced to use our 
original ordinal scale so that he was 
able to find the relation between this 
ordinal scale and the multidimen- 
sional representation of the attitude. 
His results are shown in Figure 2. 
Our ordinal scale is represented by 
the curved line in the plane of the two 
dimensions. Thus, a greater distance 
from the origin as measured along the 
line stands for a higher value on our 
ordinal scale. Points A and B on the 
curve represent the medians of Groups 
A and B in our experiment, and it is 
seen that Group A is more pro-church 
than Group B on our ordinal scale. 
The median scores for these two 
groups on the two dimensions are 
obtained simply by projecting Points 
A and B onto the two dimensions. All 
is well on Dimension 2 since there 
Group A is greater than Group B. On 
Dimension 1, however, a reversal is 
found: Group A is less than Group B, 
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Fic. 2. The curved line represents the ordi- 
nal scale of attitude toward the church plotted 
in the two-dimensional space underlying the 
attitude. (Points A and B denote the medians 
of two experimental groups. The graph is 
hypothetical, of course.) 


contrary to our ordinal scale results. 


Evidently then, possession of an 
ordinal scale does not guarantee 
invariance of ordinal scale statistics. 

A rather more drastic loss of invari- 
ance would occur if the ordinal scale 
were measuring the resultant effect of 
two or more underlying processes. 
This could happen, for instance, in 
the study of approach-avoidance 
conflict, or ambivalent behavior, as 
might be the case with attitude 
toward the church. In such situa- 
tions, two people could give identical 
responses on the one-dimensional scale 
and yet be quite different as regards 
the two underlying processes. For 
instance, the same resultant could 
occur with two equal opposing tend- 
encies of any given strength. Repre- 
senting such data in the space formed 
by the underlying dimensions would 
yield a smear of points over an entire 
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region rather than a simple curve as 
in Figure 2. 

Although it may be reasonable to 
think that simple sensory phenomena 
are one-dimensional, it would seem 
that a considerable number of psy- 
chological variables must be con- 
ceived of as multidimensional in 
nature as, for instance, with “IQ”’ 
and other personality variables. Ac- 
cordingly, as the two cited examples 
show, there is no logical guarantee 
that the use of ordinal scale statistics 
will yield invariant results under scale 
changes. 

It is simple to construct analogous 
examples for nominal scales. How- 
ever, their only relevance would be to 
show that a reduction of all results to 
categorical data does not avoid the 
difficulty with invariance. 

It will be objected, of course, that 
the argument of the examples has 
violated the initial assumption that 
only ‘‘permissible’’ transformations 


would be used in changing the meas- 


uring scales. Thus, speed and time 
are not linearly related, but rather 
the one is a reciprocal transformation 
of the other. Similarly, Dimension 1 
of Figure 2 is no monotone transfor- 
mation of the original ordinal scale. 
This objection is correct, to be sure, 
but it simply shows that the problem 
of invariance of result with which one 
is actually faced in science has no 
particular connection with the invari- 
ance of “‘permissible’’ statistics. The 
examples which have been cited show 
that knowing the scale type, as deter- 
mined by the commonly accepted 
criteria, does not imply that future 
scales measuring the same phenom- 
ena will be ‘permissible’ transfor- 
mations of the original scale. Hence 
the use of “permissible’’ statistics, 
although guaranteeing invariance of 
result over the class of ‘‘permissible”’ 
transformations, says little about 


NORMAN H. ANDERSON 


invariance of result over the class of 
scale changes which must actually be 
considered by the investigator in his 
work. 

This point is no doubt pretty obvi- 
ous, and it should not be thought that 
those who have taken up the scale- 
type ideas are unaware of the prob- 
lem. Stevens, at least, seems to ap- 
preciate the difficulty when, in the 
concluding section of his 1951 article, 
he distinguishes between psychologi- 
cal dimensions and indicants. The 
former may be considered as inter- 
vening variables whereas the latter 
are effects or correlates of these vari- 
ables. However, it is evident that an 
indicant may be an interval scale in 
the customary sense and yet bear a 
complicated relation to the underly- 
ing psychological dimensions. In such 
cases, no procedure of descriptive or 
inferential statistics can guarantee in- 
variance over the classof scale changes 
which may become necessary. 

It should also be realized that only 
a partial list of practical problems of 
invariance has been considered. Ef- 
fects on invariance of improvements 
in experimental technique would also 
have to be taken into account since 
such improvements would be ex- 
pected to purify or change the de- 
pendent variable as well as decrease 
variability. There is, in addition, a 
problem of invariance over subject 
population. Most researches are 
based on some handy sample of sub- 
jects and leave more or less doubt 
about the generality of the results. 
Although this becomes in large part 
an extrastatistical problem (Wilk & 
Kempthorne, 1955), it is one which 
assumes added importance in view of 
Cronbach’s (1957) emphasis on the 
interaction of experimental and sub- 
ject variables. In the face of these 
assorted difficulties, it is not easy to 
see what utility the scale typology 
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has for the practical problems of the 
investigator. 

The preceding remarks have been 
intended to put into broader perspec- 
tive that sort of invariance which is 
involved in the use of permissible 
statistics. They do not, however, 
solve the immediate problem of 


whether to use rank order tests or F 


in case only permissible transforma- 
tions need be considered. Although 
invariance under permissible scale 
transformations may be of relatively 
minor importance, there is no point in 
taking unnecessary risks without the 
possibility of compensation. 

On this basis, one would perhaps 
expect to find the greatest use of rank 
order tests in the initial stages of 
inquiry since it is then that measuring 
scales will be poorest. However, it is 
in these initial stages that the possi- 
bly relevant variables are not well- 
known so that the stronger experi- 
mental designs, and hence paramet- 
ric procedures, are imost needed. 
Thus, it may well be most efficient to 
use parametric tests, balancing any 
risk due to possible permissible scale 
changes against the greater power 
and versatility of such tests. In the 
later stages of investigation, we 
would be generally more sure of the 
scales and the use of rank order proce- 
dures would waste information which 
the scales by then embody. 

At the same time, it should be 
realized that even with a relatively 
crude scale such as the rating scale of 
attitude toward the church, the 
possible permissible transformations 
which are relevant to the present 
discussion are somewhat restricted. 
Since the F ratio is invariant under 
change of zero and unit, it is no re- 
striction to assume that any trans- 
formed scale also runs from 1 to 7. 
This imposes a considerable limita- 
tion on the permissible scale transfor- 
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mations which must be considered. 
In addition, whatever psychological 
worth the original rating scale pos- 
sesses will limit still further the trans- 
formations which will occur in prac- 
tice. 

Although rank order tests do pos- 
sess some logical advantage over 
parametric tests when only permissi- 
ble transformations are considered, 
this advantage is, in the writer's 
opinion, very slight in practice and 
does not begin to balance the greater 
versatility of parametric procedures. 
The problem is, however, an empiri- 
cal one and it would seem that some 
historical analysis is needed to pro- 
vide an objective frame of reference. 
To quote an after-lunch remark of K. 
MacCorquodale, ‘‘Measurement the- 
ory should be descriptive, not pro- 
scriptive, nor prescriptive.’’ Such an 
inquiry could not fail to be fascinat- 
ing because of the light it would 
throw on the actual progress of 
measurement in psychology. One 
investigation of this sort would prob- 
ably be more useful than all the 
speculation which has been written 
on the topic of measurement. 

Question 3. Waillthe use of paramet- 
ric as opposed to nonparametric 
statistics affect inferences about under- 
lying psychological processes? Ina 
narrow sense, Question 3 is irrelevant 
to this article since the inferences in 
question are substantive, relating to 
psychological meaning, rather than 
formal, relating to data reliability. 
Nevertheless, it is appropriate to 
discuss the matter briefly in order to 
make explicit some of the considera- 
tions involved because they are often 
confused with problems arising under 
the two previous questions. With no 
pretense of covering all aspects of this 
question, the following two examples 
will at least touch some of the prob- 
lems. 
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The first example concerns the two 
students, P and Q, mentioned above, 
who had used time and speed as 
dependent variables. We suppose 
that their experiment was based on a 
2X2 design and yielded means as 
plotted in Figure 1. This graph por- 
trays main effects of both variables 
which are seen to be similar in na- 
ture in both panels. However, our 
principal concern is with the inter- 
action which may be visualized as 
measuring the degree of nonparallel- 
ism of the two lines in either panel. 
Panel P shows an interaction. The 
reciprocals of these same data, plotted 
in Panel Q, show no interaction. It is 
thus evident in the example, and true 
in general, that interaction effects 
will depend strongly on the measur- 
ing scales used. 

Assessing an interaction does not 
always cause trouble, of course. Had 
the lines in Panel P, say, crossed each 
other, it would not be likely that any 
change of scale would yield uncrossed 
lines. In many cases also, the scale 
used is sufficient for the purposes at 
hand and future scale changes need 
not be considered. Nevertheless, it is 
clear that a measure of caution will 
often be needed in making inferences 
from interaction to psychological 
process. If the investigator envisages 
the possibility of future changes in 
the scale, he should also realize that a 
present inference based on significant 
interaction may lose credibility in the 
light of the rescaled data. 

It is certainly true that the inter- 
pretation of interactions has some- 
times led to error. It may also be 
noted that the usual factorial design 
analysis is sometimes incongruent 
with the phenomena. In a 2X2 de- 
sign it might happen, for example, 
that three of the four cell means are 
equal. The usual analysis is not 
optimally sensitive to this one real 
difference since it is distributed over 
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three degrees of freedom. In such 
cases, there will often be other para- 
metric tests involving specific com- 
parisons (Snedecor, 1956) or multiple 
comparisons (Ducan, 1955) which are 
more appropriate. Occasionally also, 
an analysis of variance based on a 
multiplicative model (Williams, 1952) 
will be useful (Jones & Marcus, 1961). 
A judicious choice of test may be of 
great help in dissecting the results. 
However, the test only answers set 
questions concerning the reliability of 
the results; only the research worker 
can say which questions are appropri- 
ate and meaningful. 

Inferences based on nonparamet- 
ric tests of interaction would pre- 
sumably be less sensitive to certain 
types of scale changes. However, 
caution would still be needed in the 
interpretation as has been seen in 
Question 2. The problem is largely 
academic, however, since few non- 
parametric tests of interaction exist.* 
It might be suggested that the ques- 
tion of interaction cannot arise when 
only the ordinal properties of the 
data are considered since the interac- 
tion involves a comparison of differ- 
ences and such a comparison is illegit- 
imate with ordinal data. To the 
extent that this suggestion is correct, 
a parametric test can be used to the 
same purposes equally well if not 
better; to the extent that it is not cor- 
rect, nonparametric tests will waste 
information. 

One final comment on the first 
example deserves emphasis. Since 
both time and speed are interval 
scales, it cannot be argued that the 


* There is a nomenclatural difficulty here. 
Strictly speaking, nonparametric tests should 
be called more-or-less distribution free tests. 
For example, the Mood-Brown generalized 
median test (Mood, 1950) is distribution free, 
but is based on a parametric model of the same 
sort as in the analysis of variance. As noted 
in the introduction, the usual terminology is 
used in this article. 
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difficulty in interpretation arises be- 
cause we had only ordinal scales. 

The second example, suggested by 
J. Kaswan, isshown in Figure 3. The 
graph, which is hypothetical, plots 
amount of aggressiveness as a func- 
tion of amount of stress. A glance at 
the graph leads immediately to the 
inference that some sort of threshold 
effect is present. Under increasing 
stress, the organism remains quies- 
cent until the stress passes a certain 
threshold value, whereupon the or- 
ganism leaps into full scale aggressive 
behavior. 

Confidence in this interpretation is 
shaken when we stop to consider that 
the scales for stress and aggression 
may not be very good. Perhaps, 
when future work has given us im- 
proved scales, these same data would 
yield a quite different function such 
as a straight line. 

One extreme position regarding the 
threshold effect would be to say that 
the scales give rank order information 
and no more. The threshold infer- 
ence, or any inference based on char- 
acteristics of the curve shape other 
than the uniform upward trend, 
would then be completely disallowed. 
At the other extreme, there would be 
complete faith in the scales and all 
inferences based on curve shape, 
including the threshold effect, would 
be made without fear that they would 
be undermined by future changes in 
the scales. In practice, one would 
probably adopt a position between 
these two extremes, believing, with 
Mosteller (1958), that our scales 
generally have some degree of numer- 
ical information worked into them, 
and realizing that to consider only the 
rank order character of the data 
would be to ignore the information 
that gives the strongest hold on the 
behavior. 

From this ill-defined middle ground, 
inferences such as the threshold effect 
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Fic. 3. Aggressiveness plotted as a function 
of stress. (The curve is hypothetical. Note 
the hypothetical threshold effect.) 


would be entertained as guides to 
future work. Such inferences, how- 
ever, are made at the judgment of the 
investigator. Statistical techniques 
may be helpful in evaluating the 
reliability of various features of the 
data, but only the investigator can 
endow them _ with psychological 
meaning. 


SUMMARY 


This article has compared paramet- 


ric and nonparametric statistics 
under two general headings: practical 
statistical problems, and measure- 
ment theoretical considerations. The 
scope of the article is restricted to 
situations in which the dependent 
variable is numerical, thus excluding 
strictly categorical data. 

Regarding practical problems, it 
was noted that the difference between 
parametric and rank order tests was 
not great insofar as significance level 
and power were concerned. However, 
only the versatility of parametric 
statistics meets the everyday needs of 
psychological research. It was con- 
cluded that parametric procedures 
are the standard tools of psychological 
statistics although nonparametric 
procedures are useful minor tech- 
niques. 

Under the heading of measurement 
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theoretical considerations, three ques- 
tions were distinguished. The well- 
known fact that an interval scale is 
not prerequisite to making a statisti- 
cal inference based on a parametric 
test was first pointed out. The second 
question took up the important 
problem of invariance. It was noted 


that the practical problems of invari- 
ance or generality of result far trans- 
cend measurement scale typology. In 
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Several years ago, Cattell (1946) 
published a description of what he 
called the ‘‘covariation chart,” a 
graphic model which illustrates six 
basic forms of covariation with which 
we may deal in psychological re- 
search. It is the purpose of the 
present paper to describe an exten- 
sion and modification of Cattell’s 
schema that will provide much more 
comprehensive classification of actual 
and possible research designs in 
psychology. 

The six forms of covariation en- 
compassed by Cattell’s model have 
been variously labeled with letters 
through the alphabetic range from M 
to T, but the labeling indicated in 
Table 1 has come to be reasonably 
standard. The covariation chart 
itself consists of a parallelepiped, in 
which the three dimensions represent 
tests, persons, and occasions. Any 
plane parallel to any surface of the 
model represents a score matrix 
which might correspond to the data 
from a psychological research. There 
are three such sets of planes, any one 


TABLE 1 


Tue S1x Basic Forms or CovARIATION 
INDICATED IN THE COVARIATION CHART 





Variables 
held con- 
stant or 
singular 


Series over 
which 


correlated 


Tech- 


nique 


Variables 
correlated 


Tests 
Persons 
Tests 
Occasions 
Persons 
Occasions 





Persons Occasions 
Tests 
Occasions 
Tests 
Occasions 
Persons 


Occasions 
Persons 
Persons 
Tests 
Tests 





plane permitting consideration of two 
kinds of covariation. 

The major virtue of a classification 
scheme like that embodied in the 
covariation chart is that it can sug- 
gest forms of valuable research which 
might otherwise be overlooked. As 
Cattell himself has clearly recog- 
nized, however, the scope of the 
covariation chart model has certain 
unfortunate limitations. When he 
first presented the covariation chart, 
Cattell pointed out that the six tech- 
niques did not really exhaust the 
forms of covariation inherently de- 
rivable from the three-dimensional 
model. The other forms which he 
considered at that time, however, are 
essentially variants or compounds of 
the six basic forms of covariation. 


Various more novel techniques will 


emerge, of course, if we can find 
justification for adding other dimen- 
sions to the model. In a more recent 
publication, Cattell (1957) points out 
that a psychological event may be 
characterized in terms of six inde- 
pendent “‘tags’’: a reacting organism, 
a focal stimulus, a background condi- 
tion, a response, an occasion in time 
and space, and an observer. He sug- 
gests that any pair of tags may serve 
as the dimensions of a score matrix 
yielding a technique and its trans- 
pose. Since there are 15 possible pairs 
of tags, there are 15 possible tech- 
niques (and their corresponding 
transposes). Furthermore, the ele- 
ments within any matrix could corre- 
spond to any of the six tags. Logi- 
cally, this would extend the system to 
90 possible techniques (or 180, includ- 
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ing transposed techniques). Cattell 
apparently excludes some combina- 
tions and speaks of 45 possible tech- 
niques. To these he adds five addi- 
tional possibilities that involve a 
mixture of tags along one axis of the 
score matrix. 

In the view of the writer, the 
original covariation chart provides 
too limited a classificatidn system. 
The extended model, however, intro- 
duces needless complexity and is 
subject to useful modification and 
simplification. Cattell’s six tags 
represent six distinguishable aspects 
of any observed psychological event, 
but they do not, on that account, 
constitute six meaningfully distin- 
guishable aspects of research design. 

The distinction between focal stim- 
ulus and background condition is a 
somewhat arbitrary one, and its use- 
fulness in design classification is 


questionable. We can nearly always 
isolate a great variety of stimulus 
variables that will influence a given 


event in a more or less direct way. 
Insofar as the researcher analyzes the 
effect of one of these variables, it 
becomes a focal stimulus variable, at 
least from the standpoint of the 
researcher and hence of the research 
design. Background conditions are 
otherwise irrelevant to experimental 
design, unless they are confounded 
with other kinds of variables (organ- 
isms or occasions). 

The observer is also a vital part of 
any psychological event dealt with in 
research, but the observer becomes 
important as a component of design 
only to the extent that he is some- 
thing more than an observer. If his 
presence in the situation affects the 
behavior of the subject of the experi- 
ment, the observer becomes to that 
extent a part of the stimulus situation 
and may be analyzed accordingly. If 
our interest, on the other hand, is in 
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peculiarities of the observer as a re- 
corder or rater of behavior, we are to 
that extent treating him as a reacting 
organism, i.e., as the subject of an ex- 
periment superimposed on another 
experiment. 


Basic COMPONENTS OF DESIGN 
IN PsYCHOLOGICAL RESEARCH 


It is possible to characterize a 
psychological event in terms of a 
great number of distinguishable fea- 
tures which set it apart from other 
psychological events, but there are 
basically only four such features that 
constitute essential and distinguish- 
able parameters of any research de- 
sign employed to study psychological 
events. We shall refer to these fea- 
tures henceforth as design components 
and label them R, S, P, and O (not to 
be confused with Techniques R, S, P, 
and Q). 

Design Component R is that realm 
of variables which consists of struc- 
tural or functional manifestations on 
the part of the subject or subjects 
under investigation and which are 
studied through observation and 
measurement of the subject or of 
products of the subject’s behavior. 
Commonly treated as single R-com- 
ponent variables are specific re- 
sponses, score summaries of patterns 
or sets of responses, and attributes. 
Design Component S is that realm of 
variables which arises from sources 
outside the subject and which may be 
expected to influence the subject's 
behavior. S, then, refers to external 
stimuli. Those things which are 
sometimes called “internal stimuli” 
fall within the scope of R-component 
variables if they are directly ob- 
served or measured. The P com- 
ponent is that of the human or ani- 
mal subjects observed in the experi- 
ment. The O component is the 
realm of occasions, in given time and 
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space, on which experimental ob- 
servations are made. 

These four components are ordi- 
narily quite distinct from one another 
and subject to separate specification. 
For some purposes, we may artifi- 
cially tie variables of one component 
to those of another. In such cases, we 
may speak of a “‘confounding”’ of de- 
sign components. Confounding is 
most common with respect to Com- 
ponent O, which for various purposes 
we permit to vary systematically 
with certain S, P, or R variables. A 
confounding of S and P variables is 
also quite common. 

In a sense, any variable that we 
observe and describe may be said to 
be measured, at least implicitly, for 
if our description contains only an 
identifying qualitative statement, we 
have provided the essential ingredi- 
ents of nominal scaling. Since the 
variables of all four design compo- 
nents are subject to observation and 


description in a psychological experi- 
ment, they may be regarded as sub- 
jected simultaneously and independ- 
ently to measurement and scaling. 


Within any component, variables 
may be scaled at any level—nominal, 
ordinal, interval, or ratio—and are 
sometimes simultaneously scaled at 
more than one level. 

A peculiarity of Component P that 
should be noted is that data within it 
are usually treated as scaled either at 
the nominal or at the ratio level. So 
long as we are concerned merely with 
identifying individuals as distin- 
guishable entities, we make only the 
assumptions of nominal scaling. 
When we treat individuals as equiv- 
alent units that can be added to- 
gether, however, and express P- 
component data in terms of numbers 
of cases or proportions of a total sam- 
ple of subjects, we have made the 
essential assumptions underlying 
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ratio scaling. The data could be ex- 
pressed in ordinal form if the label 
identifying the individual assumed 
the form of an index of rank within 
a social hierarchy. We could trans- 
form the data from ordinal form to 
presumably interval form either by 
making certain parametric assump- 
tions or by adopting some appro- 
priate measure of discriminability of 
adjacent ranks as an index of interval 
size. (Numerical data within the 
realm of Component P may assume 
any form consistent with the notion 
of measurement in terms of individuals. 
The application of measurement to 
individuals, however, yields R-com- 
ponent data.) 


AN EXTENDED COVARIATION 
DESIGN CLASSIFICATION 


A consideration of the role played 
by variables of the four design com- 
ponents in the covariation chart re- 
veals that R-component variables 
are consistently assigned to the cells 
within the score matrices correspond- 
ing to Techniques R, Q, P, O, S, and 
T. The numbers in the body of a 
score matrix represent what we con- 
ceive of as the dependent variable in 
an experiment. In psychological re- 
search, the dependent variable is 
customarily, but not inevitably, the 
response variable. While our interest 
may lie in finding what sort of re- 
sponse will appear in a given situa- 
tion, we may seek, with equal justi- 
fication, to determine which individ- 
ual will give a particular response, 
which stimulus will evoke the re- 
sponse, or on what occasion the re- 
sponse will appear. If we thus permit 
any of the four design components to 
furnish the elements within the score 
matrix, we are led to the system of 
24 techniques shown in Table 2. 

It may be noted that no component 
appears twice in any row of Table 2. 
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TABLE 2 


AN EXTENDED SysTEM OF Co- 
VARIATION DESIGNS 





Series 
over 
which 
covaria- 
tion is 
studied 


Varia- 
ble in 
which 
varia- 
tion is 
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Note.—The letters in the second, third, 
fourth, and fifth columns refer to the design 
components from which variables are drawn. 


This classification system assumes 
that the two axes of the matrix and 
the elements within the matrix will 
generally represent three different de- 
sign components. Supporting this 
assumption is the fact that each de- 
sign component represents variables 
which are an integral part of any 
psychological event, and the ques- 
tions raised in psychological research 
normally refer to the manner in 
which variables of the different 
realms represented by the four com- 
ponents converge in a given psycho- 
logical event. It must be granted, 
however, that our assumption is, in 


some respects, an arbitrary one. It is 
possible to conceive of designs in 
which the axes and the matrix ele- 
ments would not represent three dif- 
ferent components, but such designs 
can also be rationalized quite readily 
as variants of techniques already in 
thesystem. Whether the classification 
system proposed here will generally 
provide the most convenient frame- 
work for design conceptualization 
must ultimately be determined 
through practical application. In 
any case, a classification system of 
this sort cannot be exhaustive if it is 
to remain fairly simple. It can merely 
provide a framework of basic proto- 
typal techniques. Some designs will 
inevitably appear as combinations or 
variants of these techniques. 

It must be emphasized that these 
techniques refer to research designs 
in which covariation is to be ob- 
served, but they do not imply any 
particular form of statistical analysis. 
In general, the desired indices of co- 
variation will be furnished by corre- 
lational methods. Whether a method 
such as factor analysis or cluster anal- 
ysis will be applied subsequently is 
an additional consideration. 


COVARIATION DESIGN AND 
CONCOMITANCE DESIGN 


If we are interested in truly com- 
prehensive classification of psycho- 
logical research designs, we must 
recognize at the outset that most 
psychological experiments are not 
actually concerned with covariation. 
The simplest form of research would 
call for a single measurement. This 
measurement might fall within the 
realm of any of our four design com- 
ponents, and it could be thought of as 
the single element filling a single-cell 
matrix. The variables of the other 
three components would also be 
singular. 

More commonly we speak of re- 
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search design when we seek data for a 
matrix of at least two cells and where 
we are interested in a relationship 
among the ingredients of the matrix. 
The relationship may nearly always 
be considered in terms of a concomit- 
ance of two or more elements falling 
within the realm of one of our design 
components, and these elements are 
related in terms of their convergence 
with elements corresponding to a 
different component. If we represent 
all variables or elements of a common 
design component along a common 
axis of a score matrix, the data of 
many experiments must be thought 
of as filling cells arranged serially in a 
single row or column. We relate 
either the single cell rows of a single 
column matrix or the single cell 
columns of a single row matrix. 

The kind of matrix we are now de- 
scribing is a truncated version of the 
kind we assumed in classifying 
covariation designs. We can speak 
meaningfully of concomitance with 
respect to two single cell rows, but 
not of covariation, for this assumes 
two relatable series of values. In a 
single column matrix, whatever com- 
ponent would otherwise have con- 
stituted a horizontal axis is now 
treated as singular. 

Nearly every psychological re- 
search design is concerned with con- 
comitance, but not necessarily with 
covariation (i.e., concomitant varia- 
tion). Since the covariation design is 
really a special case of concomitance 
design, it would be worthwhile to 
have a scheme of classification for 
concomitance designs which would 
parallel that for covariation designs. 
Such a scheme is presented in Table 
3. Since in each concomitance design 
the serial variable is replaced by an 
additional singular variable, each 
concomitance design may be con- 
sidered a truncated version of either 
of two covariation designs. 


TABLE 3 


Basic CONCOMITANCE DESIGNS 


Varia- 
bles in 
which 
varia- 
tion is 
noted 





Singu- 
lar or 
con- 
stant 

varia- 


bles 


Parallel 
covari- 
ation 
designs 


Varia- 
bles re- 
lated 


Tech- 
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Mu P 
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Note.—The letters in the second, third, and 
fourth columns refer to design components 
from which variables are drawn. 


APPLICATIONS OF CONCOMITANCE 
DESIGNS 

The techniques labeled Alpha, 
Beta, and Gamma in Table 3 repre- 
sent the most familiar forms of psy- 
chological research, and in them we 
find the most frequent application of 
such forms of statistical analysis as 
the critical ratio and analysis of vari- 
ance. Beta technique has a common 
application in the comparison of 
responses of groups which differ with 
respect to variables outside the range 
of observation within the experiment 
(e.g., two different occupational 
groups, psychotics and “normals,” 
men and women, etc.). Comparison 
of matched groups subjected to differ- 
ent stimulus conditions would con- 
stitute a form of Alpha technique, 
since P-component variables are held 
constant. Interest is here focused on 
the relating of stimuli, as in the 
simpler form of Alpha technique in- 
volving such a comparison for a 
single individual or single group of 
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individuals. A compounding of tech- 
niques is possible in designs of more 
than one-way classification. Thus, 
we should have a compound of Alpha 
and Beta techniques if we classified 
both in terms of known group mem- 
bership and in terms of stimulus con- 
ditions. The reader will note that the 
score matrix in terms of which we 
conceptualize the design differs from 
the tabular arrangement usually em- 
ployed with analysis of variance in 
that the variables to be related are 
represented along a common axis. 
Thus the score matrix for a complex 
factorial design of the Alpha-tech- 
nique variety would consist of a long 
single column of R data. Each row 
would represent the data for a group 
simultaneously scaled with respect 
to several stimulus dimensions. 

In Techniques Delta, Epsilon, and 
Zeta, the stimulus is conceptually the 
dependent variable. These tech- 


niques bring to mind certain applica- 
psychophysical 


tions of methods. 
Strictly speaking, the procedures 
usually called “‘psychophysical meth- 
ods,”’ as described by such writers as 
Graham (1950) and Guilford (1954), 
are methods of measurement and do 
not define specific experimental de- 
signs to any greater extent than do 
methods of statistical analysis. In 
actual application, however, they 
form a basis for a limited range of 
concomitance designs. 

The most common applications of 
psychophysical methods may be 
thought of as constituting either 
Alpha technique or Delta technique, 
depending largely on the use made of 
the data. The simple application of 
the method of constant stimuli, for 
example, would constitute Delta 
technique if we dealt with the result- 
ing data in terms of a relationship be- 
tween the two response categories. 
Each of the two cells of the corre- 
sponding score matrix would contain 
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the value of the stimulus eliciting the 
given response for a certain percent- 
age of trials. On the other hand, find- 
ings may be expressed by means of a 
curve in which stimulus magnitude 
is plotted against the percentage of 
trials in which either response is pro- 
duced. The design may then be con- 
sidered either Alpha technique or 
Delta technique, depending on 
whether we consider the curve as a 
way of expressing relationships 
within a continuous series of S cate- 
gories or within a continuous series of 
R categories (percentages in the pres- 
ent instance). Similar reasoning 
would apply, of course, to the appli- 
cation of other psychophysical meth- 
ods. More complex applications of 
these methods, in which R variables 
are related to a combination of inter- 
acting S dimensions—as in Lick- 
lider’s (1951) treatment of auditory 
functions—may be regarded as com- 
parable to the application of factorial 
design in Alpha technique. Psycho- 
physical methods are less commonly 
applied in research classifiable as 
Epsilon or Zeta technique, although 
certain applications of these methods 
in clinical research (e.g., certain 
studies involving flicker fusion, size 
judgments, distance judgments, and 
judgments of the vertical) would 
certainly qualify as Epsilon tech- 
nique. 

Techniques Eta, Theta, and Iota 
are a common realm of application 
for nonparametric techniques of sta- 
tistical analysis. Depending on the 
manner in which P-component data 
are expressed, we may analyze find- 
ings in terms of cell frequencies, over- 
lap of cases among cells, or compara- 
bility of person ranks associated with 
various cells. 

Techniques Kappa, Lambda, and 
Mu are most likely to be useful when 
variations in an occasion variable are 
presumed to covary with certain 
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attributes of subjects or with certain 
changes in the life situations of sub- 
jects. The O-component variable 
may thus reflect such things as age, 
developmental stage, and level or 
stage of experience. The develop- 
mental area is probably the most 
common realm of application. Kappa 
technique provides a means for 
grouping behaviors developmentally 
and hence for defining developmental 
stages. Lambda technique provides a 
way of defining stages in terms of 
effective stimuli. Mu technique can 
be used to compare individuals with 
respect to such things as rate of 
maturation. Many applications out- 
side the developmental realm, to 
processes involving shorter time 
spans, are possible. 


APPLICATIONS OF COVARIATION 
DESIGNS 


A detailed discussion cf possible 
applications of the familiar Tech- 


niques R, Q, P, O, S, and T would be 


superfluous here. Unfortunately, 
other treatments of these techniques 
have promoted misconceptions by 
obscuring three interrelated con- 
siderations that are basic to con- 
sistent classification. First, there is 
the distinction between concomitance 
and covariation designs. A second 
vital point is that the series over 
which covariation is observed must 
be genuinely treated as a series in 
covariation designs. Wherever a 
group is treated as a unit and a group 
average is treated as a single observa- 
tion, the group functions, for design 
classification purposes, as a single in- 
dividual. Finally, in research em- 
ploying matched groups, the P com- 
ponent is properly viewed as being 
held constant, and appropriate classi- 
fication will depend on what compo- 
nent is confounded with Component 
P. For example, in the common type 
of experiment in which equated con- 
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trol and experimental groups are sub- 
jected to different stimulus condi- 
tions, we have an instance of P tech- 
nique (not S technique, as some 
writers would have it), provided that 
response covariation is considered 
over time. The usual application of 
this design, to a single occasion, is 
simply Alpha technique. 

The remaining techniques—A 
through Z and U through Z—tepre- 
sent virtually unexploited forms of 
design, but careful consideration will 
suggest appropriate uses for each of 
them. In Techniques A through F, 
the dependent variable is of Compo- 
nent S. Appropriate quantification 
might be in terms of minimally 
sufficient stimulus magnitude or 
mean stimulus magnitude associated 
with a given response. In Techniques 
G through L, the dependent variable 
is of Component P. It may be ex- 
pressed in terms of the rank of the in- 
dividual giving a certain response to 
a certain stimulus on a certain occa- 
sion, in terms of the average rank of 
individuals so responding, or in 
terms of the number of individuals so 
responding. In Techniques U, V, W, 
X, Y, and Z, our focal variable—of 
Component O—may be expressed in 
terms of a single occasion in an 
ordered series, an average of a num- 
ber of ordered occasions, an average 
age, an average stage, etc. It is im- 
portant to note that in covariation 
designs, in contrast to concomitance 
designs, the dependent variable must 
be of at least the ordinal level of scal- 
ing. Thus, in Techniques Eta, Theta, 
and lota, the matrix cells could 
simply contain tags identifying the 
persons fitting the cell coordinates. 
Data analysis would then consist of 
assessing the overlap of entries in 
various cells. In a matrix of several 
rows and columns containing such 
nominal data, we could probably 
speak of “multiple concomitance” 
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with respect to a pair of rows or 
columns, but it is doubtful that we 
can properly speak of ‘“‘covariation”’ 
unless the data in the body of the 
matrix are expressed in a form repre- 
senting relative magnitudes or posi- 
tions on continua. 

Techniques A, C, G, J, U, and W 
all deal with the covariation of re- 
sponse categories and thus provide a 
basis for defining the structure of a 
response realm. In C and U, we 
assess the comparability of response 
categories on an_ intra-individual 
basis, with respect to conjoint ap- 
pearance on various occasions or in 
response to various stimuli. Tech- 
niques A, G, J, and W provide a 
means for assessing response com- 
parability on a group basis, in terms 
of similarity of the precipitating 
stimulus, of the occasion of mani- 
festation, or of the persons giving the 
response. 

Techniques R and P are familiar 
techniques for examining stimulus 
covariation in terms of the resulting 
response. Techniques H, K, V, and 
Y add possibilities for correlating 
stimuli in terms of covariation with 
respect to the magnitudes (ranks) or 
numbers of persons responding in a 
certain way or in terms of the par- 
ticular occasions or numbers of occa- 
sions on which the stimulus has a 
given effect. The correlating of per- 
sons is also a familiar idea by virtue 
of its introduction through Q and S 
techniques. Techniques B, E, X, and 
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Z point to the possibility of correlat- 
ing persons according to stimuli pro- 
ducing various responses, stimuli pro- 
ducing a given response on various 
occasions, occasions when various re- 
sponses appear, or occasions when 
various stimuli elicit a given response. 
Consideration of the many possible 
ways of defining the basic stimulus, 
response, and occasion data suggests 
a great variety of ways of grouping 
persons according to such things as 
physiological cycles, social roles, and 
developmental patterns. 

In applying any of the occasion- 
correlation techniques—O, 7, D, F, 
J, and L—we may select presumably 
equivalent occasions and thus ob- 
tain an estimate of the reliability, or 
stability, of a given pattern of rela- 
tionship. We may, on the other hand, 
select occasions differing in a known 
way and thereby determine the com- 
parability of these occasions. Pos- 
sible applications range from the 
psychophysical realm to the develop- 
mental realm, depending on how the 
occasion variable is defined and 
quantified. In general, the new co- 
variation techniques encompassed by 
this expanded classification system 
promise a rich harvest through novel 
approaches to diverse problems— 
particularly in the developmental, 
social, and physiological areas, where 
the possible fruits of correlational 
analysis have been recognized by too 
few researchers. 
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THE SELF-CONCEPT: 
FACT OR ARTIFACT? 
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One of the more difficult tasks for 
psychology is relating the observa- 
tion of behavior to the study of 
mental processes. One approach to 
the problem has been to limit psy- 
chology to the study of behavior and 
to leave to philosophy the task of 
speculating as to the existence and 
nature of mind and soul. 

There have, however, been psy- 
chologists who have sought to make 
sense out of human action by positing 
a self or ego, in order that they might 
understand the coherence and unity 
which they have thought that they 
have seen in human behavior. Thus, 
G. W. Allport (1943) claimed that 
the concept of ego was made neces- 


sary by certain shortcomings in as- 
sociationism, and he went on to list 
eight different uses for the concept of 


the ego. During the 1940s the 
Psychological Review was in fact well- 
flavored with articles of philosophical 
taste (Allport, 1943; Bertocci, 1945; 
Chein, 1944; Lundholm, 1940). These 
articles were attempts to find the 
source of human behavior by dis- 
cussions of concepts, but they failed 
to make a lasting distinction between 
the self as subjective knower and the 
self as object of knowledge. The self 
as essence defied definition, and the 
discussions concerning the nature of 
mind seemed relevant for neither 
experimental nor applied psychology. 

But during the 1940s there was a 
parallel attempt at construction of a 
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useful concept of the self. While 
Rogers wrestled with the problem of 
researching a client centered ap- 
proach in psychotherapy, one of his 
students (Raimy, 1943) developed a 
construct of the self which had a 
perceptual frame of reference. What 
Raimy called the self-concept was 
both a learned perceptual system 
functioning as an object in the per- 
ceptual field, and a complex organiz- 
ing principle which schematizes on- 
going experience. Raimy demon- 
strated in his dissertation that atti- 
tudes toward the self can be found 
by analyzing counseling protocols, 
and that these self-perceiving atti- 
tudes formed a reliable index for 
improvement in psychotherapy. 

The concept of the self soon 
formed the theoretical underpinning 
for a new approach to the study of 
behavior. Raimy’s construct of the 
self received further development in 
the book Individual Behavior (Snygg 
& Combs, 1949). The authors stated 
that behavior was best understood as 
growing out of the individual sub- 
ject’s frame of reference. Behavior 
was to be interpreted according to 
the phenomenal field of the subject 
rather than be seen in terms of the 
analytical categories of the observer. 

As the self-concept was born with 
client centered therapy, so congruent 
were the theory of the self and the 
practice of psychotherapy that a 
new self centered therapy became 
theoretical for the first time: Rogers 
(1951) described therapeutic change 
in a phenomenological frame of 
reference. 
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By 1950 the phenomenological 
view of the self had become the 
center of a new movement in psy- 
chology, having already generated a 
block of research studies (Rogers et 
al., 1949). When Hilgard (1949) pos- 
tulated in his APA presidential ad- 
dress the need for a self to under- 
stand psychoanalytic defense mecha- 
nisms, and called for research on the 
self, psychology listened. To the 
desert came rain that washed all 
before it. 

The deluge of studies within the 
last decade has not been contained 
within any one theoretical channel, 
so that studies involving the self- 
concept have spread over into many 
areas of psychology. Ten years of 
research efforts have produced a 
mass of data, reflecting different 
theoretical assumptions and differing 
research methods. While the time 
has now passed for one article to 
deal adequately with all the studies 
that have been done, the sheer mass 
of evidence would suggest that cer- 
tain questions be asked of theories of 
the self-concept. 

This paper is concerned with the 
problem as to whether the self is an 
objective reality which is a fit field 
for psychological research, or whether 
it is a somewhat nebulous abstrac- 
tion useful only to give a theoretical 
basis to things the psychologist 
could not otherwise understand. Put 
in other words, this paper faces the 
issue as to whether the results of 
studies of the self are to be accepted 
at face value, or whether other ex- 
planations of results would be more 
parsimonious or reasonable. 

The writer will discuss first at- 
tempts to quantify data concerning 
the self-concept to arrive at an 
operational definition. We will then 
assess the validity of measures of the 
self-concept, and will relate the self- 
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concept to other constructs. We will 
briefly allude to attempts to estab- 
lish a relationship between different 
measures. Finally, the writer will 
return to certain philosophical and 
historical considerations in order to 
reach a conclusion as to whether the 
self-concept is indeed a fact of na- 
ture, or an artifact of men’s minds. 


MEASURING THE SELF-CONCEPT 


Many psychologists have believed 
that if something exists it can be 
measured. There have been many 
investigators who have assumed that 
the self-concept refers to an existence 
of some sort and have gone on to 
measure it. 

The most popular type of opera- 
tional definition has assumed that 
the self-concept can be defined in 
terms of the attitudes toward the 
self, as determined either by the 
subject’s references to himself in 
psychotherapy or by asking him to 
mark off certain self-regarding atti- 
tudes on a rating scale. 

One of the first attempts at atti- 
tude measurement was by Sheerer 
(1949), who extracted from the pro- 
tocols of cases at the University of 
Chicago Counseling Center all state- 
ments that were relevant either for 
attitudes to self or to other people. 
These statements formed the basis 
for a 101-item rating scale. The 
Sheerer client statements also formed 
the basis for rating scales constructed 
by Phillips (1951) and by Berger 
(1952). 

The only rating scale of attitudes 
towards self that has been published 
is the Index of Adjustment and Val- 
ues (Bills, 1958). Bills states that the 
intent of the index is to measure the 
phenomenological self view as de- 
scribed by Lecky (1945), Snygg and 
Combs (1949), and Rogers (1951). 
This scale is more elaborate in that 
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each itém is ranked with three differ- 
ent instructions. First, the subject 
ranks the item on a scale as to how 
well it describes himself. Next, he 
marks the items as to how acceptant 
he is of his first, or self-rating of the 
item, and finally he rates the item as 
to the degree to which he aspires to 
be like that item. 

The scoring of the Bills index also 
is more elaborate than that tradi- 
tional for rating scales. There are in 
fact two different measures, neither 
one being simply a rating of items in 
absolute terms, as in the scales previ- 
ously described. Bills’ measures de- 
pend instead upon the differences 
between ratings made under different 
instructions. A measure of self-ac- 
ceptance is provided by the degree of 
similarity between the way the sub- 
ject sees himself as being, and the 
way he rates himself as accepting his 
self-ratings. A measure of self- 
ideal-self discrepancy is given by 
comparing the differences in ratings 
between the way the self is rated as 
being, and the way the self is rated 
as wishing to be. 

Brownfain (1952) made still an- 
other adaptation in the use of the 
rating scale, deriving a measure of 
what he terraed the stability of the 
self-concept. Subjects ranked them- 
selves on 25 words and phrases, each 
describing a different area of per- 
sonality adjustment. The measure is 
not of how sure the subject is of him- 
self, but of how sure he is of what he 
thinks about himself; the subject is 
instructed to make the ratings twice, 
first with an optimistic frame of 
reference, and then with a pessimistic 
one. The degree of congruence be- 
tween the two ratings is termed the 
degree of stability of the self-concept. 

A different theoretical approach 
towards measurement of self-con- 
cept involves the use of Q technique 
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Stephenson (1953) describes how 
one’s “inner experiences’”’ can be 
translated into behavior by means of 
Q sort, through which the phenome- 
nal field is translated into action. 
Using this method, two of Stephen- 
son’s students at the University of 
Chicago derived a conceptual self- 
system in an intensive study of a 
single subject (Edelson & Jones, 
1954). 

Others at the University of Chi- 
cago have used Q sorts as a measure 
of self-concept, in an attempt to 
assess changes in self-concept during 
psychotherapy (Rogers & Dymond, 
1954). Statements were taken from 
counseling protocols, and were sorted 
both for real self and for ideal self. 
The degree of congruence between 
the two sorts is taken as a measure of 
adjustment. 

Attempts to measure the self-con- 
cept face three difficulties. First, it 
must be demonstrated that the 
operational and philosophic meanings 
are in fact equivalent. In the case of 
the self-concept it needs to be shown 
that the “inner experience”’ is effec- 
tively conveyed by the outward 
movement of making check marks on 
lines, or sorting cards. Secondly, an 
efficient and systematic method must 
be found for selecting items for the 
scales and sorts, the problem being 
that of defining the universe from 
which items are to be selected. Fi- 
nally, the different measures imply 
different operational definitions. Just 
as one can not multiply apples and 
pears, so is it impossible to inter- 
change different operational defini- 
tions as if they were the same, or to 
pretend that each means the same 
thing by the term self-concept. 

If something is measured does it 
exist? If the answer is yes, we must 
still be aware that we may not fully 
understand what we are measuring. 





328 


One must measure, but must then 
compare and carefully validate. 


VALIDATION OF SELF-CONCEPT 
MEASURES 


A psychological construct stands 
and falls according to how useful it is 
in understanding human _ behavior. 
A term is meaningful only when 
successful validation studies have 
found significant relationships with 
established variables. 

It has been popular to validate 
self-concept scales against tests pur- 
porting to measure maladjustment 
in an attempt to demonstrate that 
one’s phenomenological view of the 
self is closely related to the degree of 
adjustment. Positive results abound. 
Calvin and Holtzman (1953) had 
college students rank themselves on 
seven personality traits, and found 
that self-depreciation was related to 
high scores on the MMPI. Zucker- 
man and Manashkin (1957) had 


neuropsychiatric patients rate them- 
selves on a scale of adjectives, and 
found that self-ratings correlated 
positively with the MMPI K scale, 
and negatively with seven of the 


other scales. Taylor and Combs 
(1952) tested the hypothesis that 
sixth grade children found to be well- 
adjusted on the California personal- 
ity scale would more often admit 
statements of self-reference which 
though unflattering were universally 
true. They got positive results, the 
self-depreciation which in other self- 
concept measures is treated as vice 
being here treated as virtue. Hanlon, 
Hofstaetter, and O’Connor (1954) 
compared the results of high school 
juniors on the California personality 
scale with the degree of congruence 
between ratings of the real and ideal 
self and found that the more con- 
gruence the better the adjustment. 
Cowen (1954) related low self-ratings 
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on the Brownfain negative self-con- 
cept with high scores on the Cali- 
fornia F Scale. Any doubt about the 
ability of investigators to find posi- 
tive results when comparing good 
adjustment as measured by objective 
personality inventories with the 
affirmativeness of self-concept should 
be dispelled by a study by Smith 
(1958). He compared congruence 
between Q sorts for self and ideal 
self with scores on the Edwards PPS, 
the Cattell factors, and measures of 
average mood. After making almost 
300 correlations, he concluded that 
having a positive self-concept is in- 
deed related to adjustment. 

Other investigators have doubted 
that the relationship between adjust- 
ment and self-satisfaction is such a 
simple one. Block and Thomas 
(1955) conceived of maladjustment 
lying at both ends of the continuum. 
They felt that too high a degree of 
self-satisfaction is due to suppressive 
and repressive mechanisms which 
cause a person to be rigid, over-con- 
trolled, restrained, and aloof. But at 
the other extreme, the person who is 
too little satisfied with self will lack 
ego defenses, and will be able neither 
to bind tensions nor control emotions. 
Block and Thomas constructed an 
ego-control scale from MMPI items. 
The scale was found to have a corre- 
lation of .44 with self-ideal-self @Q 
sort congruence, the relationship 
being curvilinear. Unfortunately, 
this was the reverse of what Chodor- 
koff (1954a) had found. Correlating 
ratings of the self as made from a bio- 
graphical inventory with the results 
of projective techniques, he found 
that maladjustment lies in the middle 
range of self-satisfaction. 

Validating self-concept measures 
against objective personality tests 
has generally been successful, but the 
true significance of these studies is 
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still not made clear. Edwards (1957) 
demonstrates how more than half 
the variance in both MMPI scales 
and in Q sorts of self-referent items is 
accounted for by social desirability. 
SD can account for significant posi- 
tive relationships even when other 
variables are totally unrelated. Ed- 
wards’ SD robs these studies neither 
of significance nor of interest, but 
does suggest that extreme care must 
be taken in the labeling of constructs. 

Attempts have also been made to 
validate self-concept against projec- 
tive personality tests. Bills has made 
several attempts to validate his scale 
by the Rorschach (Bills, 1953a, 1954; 
Bills, Vance, & McLean, 1951). The 
results are a bit ambiguous, and 
leave two observers (Cowen & 
Tongas, 1959) extremely dissatisfied. 
The TAT was used by Friedman 
(1955) to compare the Q sort dis- 
crepancy self with the self as pro- 
jected onto the TAT pictures. The 


normals were the only group to pro- 
ject positive self-qualities. Neurotics 
and paranoids both projected nega- 
tively. 

A different approach to validation 


has usec a word association test. Re- 
sults show that there is a delayed 
reaction time for those trait words 
where there has been a discrepancy 
in ratings between the self and the 
ideal self (Bills, 1953b, Roberts, 
1952). Delayed associations are 
assumed to be related to defensive- 
ness about self, which in turn is 
considered to be related to maladjust- 
ment. However, Cowen and Tongas 
(1959) wonder if defensiveness about 
trait words does not serve also to 
raise the original ratings of the actual 
self. 

Cowen chose to validate the self- 
concept by comparing the absolute 
self-rating with the learning time for 
the rated words, and found that 
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there was a higher learning time for 
words that were presumably threat- 
ening. We might however wonder if 
his and Tongas’ criticism of other 
studies does not apply here also: 
defensiveness might also cause self- 
ratings to be raised. 

Use was also made of the percep- 
tual New Look. Chodorkoff (1954b) 
presented neutral threatening 
words with a _ tachistoscope, and 
found that the better the agreement 
between a_ self-description and a 
description of the self by others, the 
less perceptual threat there will be. 

It is unfortunate that the only 
study of this general type that did 
not use college students as subjects 
Was negative in its results. Zimmer 
(1954) presented male mental pa- 
tients with trait adjectives on which 
there was a self-rating discrepancy 
between self and ideal self. A word 
association test was not found to be 
significantly related to self-discrep- 
ancy. 

The results of studies that involve 
the presentation of ‘‘hot’’ or threat- 
ening words suggestive, for 
there seems to be a common element 
in ability to free associate and learn 
threatening words. But it is possible 
that we have in these studies more a 
measure of ego defenses than of mal- 
adjustment, for the fact that the 
results are positive only with normal 
groups might suggest that the results 
are more relevant for a theory of 
personality than for a theory of 
psychopathology. In these studies it 
is indeed likely that we have support 
for Lecky’s theory of self-consist- 
ency, and for Snygg and Comb’s 
theory of the maintenance of internal 
organization. If this is so, then likely 
it is true as Block and Thomas (1955) 
suggest that only extremes in ego 
control are pathological. 

A different approach to validation 


and 


seem 
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of self-concept measures uses _ be- 
havior in a social situation as a 
criterion. The most sweeping results 
in a study of this type are reported 
by Turner and Vanderlippe (1958), 
who report that Q sort congruence 
between the self and the ideal self is 
greater in those college students who 
are more active in extracurricular 
activities, have higher scholastic 
averages, and are given higher socio- 
metric rankings by fellow students. 
Holt (1951) found that agreement 
between self-ratings and ratings by a 
diagnostic council was positively re- 
lated to intelligent, active, adven- 
turous living, and a friendly domi- 
nant social adjustment. Eastman 
(1958) found that the degree of ac- 
ceptance of self-ratings on the Bills 
index is positively related to ratings 
for marital happiness. Working in 
terms of ratings for maladjustment, 
Chase (1957) found that among mal- 
adjusted patients there was greater 


discrepancy between Q sorts for seif 
as compared with sorts for the ideal 
self and the average other person. 

Other attempts to relate self-con- 
cept to social behavior have been less 


successful. Kelman and Parloff 
(1957) obtained only chance results 
when they tried to interrelate such 
variables as congruence between self 
and ideal self, a symptom disability 
check list, a discomfort evaluation 
scale, sociometric ratings, and an in- 
effective behavior evaluation scale, 
using 15 neurotic hospital outpa- 
tients. Fiedler, Dodge, Jones, and 
Hutchins (1958) measured the self- 
concept of college students both by a 
simple rating scale and by a discrep- 
ancy measure. There was a general 
lack of correlation between these 
measures and such objective criteria 
as grade point average, health center 
visits, army adjustment, the Taylor 
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MA scale, and sociometric status. 
Coopersmith (1959) compared self- 
esteem as rated by the self with that 
estimated by observers, using chil- 
dren as subjects. He suggests that 
there are actually four types of self- 
esteem: what a person purports to 
have, what he really has, what he 
displays, and what others believe he 
has. 

There is no obvious explanation 
for the discrepancy of results in 
studies purporting to relate self- 
concept to behavioral adjustment. 
Since the basis for selecting items for 
rating scales and Q sorts differs from 
study to study, it is possible that the 
statements used in the scales of those 
studies with positive results had more 
of a relationship to the criteria than 
the statements in studies which were 
negative. 

A different approach to relating 
self-concept measures to adjustment 
is shown in a block of psychotherapy 
research studies at the University of 
Chicago (Rogers & Dymond, 1954). 
Change in self-concept was found to 
occur as a function of improvement 
during psychotherapy. Butler and 
Haigh (1954) had clients make Q 
sorts for self and for ideal self both 
before therapy and after its comple- 
tion to test the hypothesis that ther- 
apy will increase satisfaction with the 
self. Congruence between the two 
sorts increased as a result of psycho- 
therapy, the two sorts moving to- 
wards a common mean. Rudikoff 
(1954), using the same _ subjects, 
found changes during periods of time 
before and after therapy were not 
nearly as great as those occurring 
during therapy. Also with the same 
subjects, Dymond (1954) found that 
there was closer agreement after 
therapy between the way clients 
sorted the Butler and Haigh Q sort 
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cards, and the way two non-Rogerian 
clinical psychologists sorted the cards 
between what the well-adjusted per- 
son should say is like him and what is 
not like him. 

The same investigator (Cart- 
wright, 1958) related change in self- 
concept over therapy to a successful 
search for identity. She had clients 
make sortings with Butler and Haigh 
Q sort cards to describe themselves as 
they saw themselves in relationship 
to three people of their choice to test 
the hypothesis that successful ther- 
apy increases the consistency of the 
self-concept which one brings to 
different social situations. The hy- 
pothesis was confirmed. 

Ewing (1954) had counselee col- 
lege students rate a list of traits for 
self, ideal self, mother, father, coun- 
selor, and a culturally approved 
figure. There was a regression of the 
ratings toward a common mean in 
those clients who were estimated to 


be the most improved in therapy. 
Changes in self-ratings over ther- 

apy seem certainly to have occurred. 

But they seem to take place also with- 


out psychotherapy. Taylor (1955) 
devised a Q sort divided between posi- 
tive and negative statements. After 
subjects made repeated sortings both 
for self and for ideal self, he concluded 
that self-introspection without ther- 
apy results in increased positiveness 
of attitude toward the self; that the 
self and ideal self will draw closer to- 
gether; and that repeated self-de- 
scriptions are accompanied by in- 
creased self-consistency. Engel (1959) 
studied the stability of self-concept 
in adolescence, and also found a 
trend towards more positive Q sort- 
ing over a 2-year period. And finally 
Dymond herself (1955) found an 
increased congruence between Q sorts 
for self and for ideal self among sub- 
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jects waiting for psychotherapy, al- 
though ratings of adjustment based 
on TAT protocols showed no change 
over the period. 

Dymond attributes increased self 
—ideal-self congruence without psy- 
chotherapy as due to the strengthen- 
ing of neurotic defenses. It might be 
charged that similar changes during 
therapy might have the same basis. 
Dymond also raises the possibility 
that the sorts can be influenced by 
the attitude of the therapist towards 
the client’s self. There is in short no 
complete assurance that the cognitive 
self-acceptance as measured by the Q 
sort is related to the deeper level of 
self-integration that client-centered 
therapy seeks to achieve. 

Indirect evidence of change of the 
self-concept during counseling is pro- 
vided by studies showing changes of 
self-estimates. Several studies show 
that agreement between self-ratings 
on interests and the ratings of the self 
by interest inventories increase as a 
result of counseling (Berdie, 1954; 
Froehlich, 1954; Johnson, 1953; 
Singer & Steffire, 1954). The first two 
of these studies show a moderate in- 
crease in accuracy in predicting one’s 
intelligence, but very little improve- 
ment in rating the self on measures 
of personality. One might reason that 
some parts of the self-concept are 
peripheral to the core of the self (e.g., 
interests) and are therefore unstable, 
while other parts (e.g., personality 
estimates) are central to the self and 
are therefore extremely resistant to 
change. 
SELF-CONCEPT—SELF-CONSISTENCY 

If the self-concept is to have useful- 
ness as a construct it must be shown 
that it is consistent in a given self. It 
must be known whether the self-con- 
cept is a gestalt that is more than the 
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sum of different self-regarding atti- 
tudes, or whether instead the self- 
concept is an impossible attempt to 
generalize different feelings toward 
unique situations. 

One answer to this question is pro- 
vided by Akeret (1959). He inter- 
correlated self-ratings on academic 
values, interpersonal relations, sex- 
ual adjustment, and emotional ad- 
justment, achieving differentially 
positive interrelationships. Emo- 
tional adjustment was the best indi- 
cator, correlating + .61 with a total 
corrected for part-whole inflation. 
While Akeret interpreted his results 
as suggesting that an individual does 
not accept or reject himself totally, 
the results might also be interpreted 
as suggesting that some areas of self- 
regard are more central to the self- 
concept than other areas. 

Consistency in the self-concept was 
found by Martire and Hornberger 
(1957), who found very great simi- 
larities between measures of the 
actual self, the ideal self, and a 
socially desirable self. But incon- 
sistency was found by McKenna, 
Hofstaetter, and O’Connor (1956), 
who found that one’s self ideal dif- 
fered less from one’s close friends 
than the close friends differed from 
each other. These investigators 
concluded by rather involved reason- 
ing that the ideal self is sufficiently 
differentiated to seek different need 
satisfactions in different people. 

The search for consistency in the 
self led also to comparing scores on 
different measures of self-concept. 
Omwake (1954) compared three scales 
—the Bills, Phillips, and Berger— 
which measure acceptance both of 
self and of others. The scales were in 
closer agreement as to the degree of 
acceptance of self than they were as 
to acceptance of others. Brownfain 
(1952) found that low ratings of self 
were related on his scale to the dis- 
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crepancy between optimistic and 
pessimistic self-ratings, or what 
Brownfain termed stability of self- 
concept; and Cowen (1954) found a 
relationship between the pessimistic 
Brownfain self-ratings, and the dis- 
crepancy between self— and ideal-— 
self-ratings on the Bills index. Ben- 
dig and Hoffman (1957) found that 
Bills’ scores on acceptance of self- 
ratings and on congruence between 
ratings of self and ideal self related 
equally well to scales of the Mauds- 
ley Personality Inventory. They 
therefore concluded that the two 
different Bills index measures are 
redundant. 

But on the negative side, Cowen 
(1956) found no relation between the 
so-called stability of self-concept on 
the Brownfain, and the different 
measures on the Bills. Hampton 
(1955) likewise failed to find any 
significant relationship between abil- 
ity to make realistic appraisals about 
oneself and the ability to admit state- 
ments that were damaging but prob- 
ably true. 

Different measures of the self- 
concept have different theoretical 
and operational bases. Where meas- 
ures apply similar rationale, signifi- 
cant correlations between measures 
have been found. But in similar 
measures such extraneous variables 
as response set and social desirability 
will produce similar bias. Measures 
of self-concept have reliability, and in 
a certain degree are interchangeable. 
Whether or not the reasons for simi- 
larity are intrinsic to the scales, the 
notion of the internal frame of refer- 
ence seems well validated. 


DISCUSSION 
The scientist can not hold truths 
to be self-evident. What is known of 
the self through direct report must be 


considered suspect due to philosophi- 
cal considerations, since the nature 


THE SELF-CONCEPT: FACT OR ARTIFACT 


of the ‘‘I’’ has been seen differently in 
each ideological epoch. Notions con- 
cerning the self are like other human 
ideas, and are inventions and not 
discoveries. The task is not that of 
discovering the “true self,’’” but in- 
stead of constructing those notions 
which increase understanding of hu- 
man behavior. Just as the number of 
inventions is potentially unlimited, 
so there need be no limit on the num- 
ber of constructions put upon the 
self. In this discussion we will pro- 
ceed functionally, and consider the 
uses to which different selves have 
been put. 

The first self is the knowing self of 
structural psychology. Its function is 
to apprehend reality. The rational 
nature of man has always been in 
dispute, and the New Look in per- 
ception has further undermined this 
conception. This article has cited 
studies which throw doubt on the 


ability of the self to perceive itself 
correctly in those areas which are of 


great value to it. It is the change in 
the self as perceiver of itself that is 
the aim of client centered therapy. 
Studies of client centered therapy do 
not reveal whether therapy brings 
the client any closer to reality, but 
they do provide some evidence that 
the perception of the self is brought 
closer to social expectancies. 

The second construction of the self 
is that of motivator. This is the self 
of thinkers who believe that the 
individual is motivated by a need 
for self-assertion, or self-realization, 
by realizing those potentialities which 
inhere within the self.. Attempts to 
validate this construct of the self 
have been carried on through work 
on need achievement. This construct of 
the self seems involved also in ratings 
and Q sorts for an ideal self which 
out-distances the real self. Here, of 
course, the self whose reach exceeds 
its grasp is considered to be patho- 
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logical, for it is shown how psycho- 
therapy helps reduce the disparity 
between the real and ideal. 

The third construct of self is the 
humanistic, semireligious conception 
of the self as that which experiences 
itself. It is the ‘unique personal 
experience’ of Moustakas (1957) and 
the experience of feeling in Rogers 
(1951). The difficulty for the psy- 
chologist is that such a conception is 
more religious than scientific; it be- 
comes a value-orientation, and, as 
the writer shown’ elsewhere 
(Lowe, 1959), it becomes a highly 
controversial statement of what is 
the highest good. 

The fourth approach 
self as organizer. 
psychoanalytic ego; the internal 
frame of reference of Snygg and 
Combs (1949); and the source of 
construct making in G. A. Kelly 
(1955). Any operational measure of 
self-consistency would seem to imply 
the existence of such a self. It is this 
self that this article has been most 
directly concerned with; to the ex- 
tent that studies have been positive, 
the self does respond the same way in 
different situations. Conversely, to 
the extent that the studies have had 
negative results there is enough in- 
consistency in the self that it does not 
always act according to prediction. 

A fifth approach constructs the 
self as a pacifier. Such a self seems 
implied in Lewin (1936), who con- 
structed his system of personality in 
terms of valences or tensions which 
the organism seeks to keep to a mini- 
mum. It present also in 
Angyal (1941) who views life as an 
oscillation about a position of equilib- 
rium. The self in other words is seen 
as an adjustment mechanism which 
seeks to maintain congruence be- 
tween the self and the nonself. It is 
the verification of this type of self 
that seems implied by Q sort studies 


has 


views the 
This self is the 


seems 
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that show increased congruence of 
real and ideal self as a result of psy- 
chotherapy. We must however note 
that the self as pacifier stands in 
direct opposition to the self as moti- 
vater. 

In the sixth view of the self, the self 
is the subjective voice of the culture, 
being purely a social agent. It is the 
self of both sociology and S-R 
psychology, for it sees behavioral 
responses solely in terms of social 
conditions or stimuli inputs. The 
self as an entity is denied, and be- 
havioral consistency is seen as resid- 
ing not in the individual but in simi- 
lar environmental events. If the term 
self is used, it is seen in terms of ego- 
involvements with loyalties which 
are determinative of the self. 

From these different conceptions 
of the self, we can choose the one 
which best fits our theoretical frame 
of reference. But which conception 


is chosen seems to depend more upon 
faith than upon logic, and the choice 


of one conception must of necessity 
deny other constructs. It seems im- 
possible that the self can function as 
a motivator which constantly tries to 
change the status quo, and as a paci- 
fier which minimizes the disparity 
between the real and ideal self. There 
is a contradiction also between the 
self as motivator and the self as feel- 
ing, for in the latter the self is ac- 
cepted as it is, but in the former is not. 
Differences are apparent also between 
the self as feeling and as pacifier. 
And finally, the self as agent of soci- 
ety is opposed to all other concep- 
tions. 
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CONCLUSION 

Is the self-concept a fact which, 
having an objective existence in na- 
ture, is observed and measured; or is 
it an epiphenomenon of deeper real- 
ity, invented by man that he might 
better study his behavior? 

The world has sought to be so sure 
of the self because there is so little 
else of which it can be certain. The 
self has become the anchor that man 
hopes will hold in the ebbtide of 
social change. But just as a fish 
could never know it was surrounded 
by water unless that water were to 
disappear, it is unlikely that Lecky 
(1945) would have known about self- 
consistency had he not lived in a 
culture which felt inconsistency. In 
Buberian terminology, the self is an 
It, which man invents because he 
can not find a Thou. 

The position of this paper must be 
that the self is an artifact which is 
invented to explain experience. If 
the self-concept is a tool, it must be 
well designed and constructed. We 
will conclude therefore with that 
construct of the self which best serves 
the 1960s. Such a construction com- 
bines the self of ego-involvement with 
the self of feeling. It is a self which is 
existential not to experience itself, 
but to mediate encounter between 
the organism and what is beyond. 
Such a self is what Pfuetze (1954) 
calls the “‘self-other dialogic theory 
of the self,” being interpreted nat- 
uralistically through Mead and 
transcendentally through Buber. It 
is as an artifact that the self-concept 
finds meaning. 
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